PoC for a ProcessorMixin class (#15549)
* PoC for a ProcessorMixin class * Documentation * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Roll out to other processors * Add base feature extractor class in init * Use args and kwargs Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Suraj Patil <surajp815@gmail.com> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
This commit is contained in:
@@ -12,10 +12,22 @@ specific language governing permissions and limitations under the License.
|
||||
|
||||
# Processors
|
||||
|
||||
This library includes processors for several traditional tasks. These processors can be used to process a dataset into
|
||||
examples that can be fed to a model.
|
||||
Processors can mean two different things in the Transformers library:
|
||||
- the objects that pre-process inputs for multi-modal models such as [Wav2Vec2](../model_doc/wav2vec2) (speech and text)
|
||||
or [CLIP](../model_doc/clip) (text and vision)
|
||||
- deprecated objects that were used in older versions of the library to preprocess data for GLUE or SQUAD.
|
||||
|
||||
## Processors
|
||||
## Multi-modal processors
|
||||
|
||||
Any multi-modal model will require an object to encode or decode the data that groups several modalities (among text,
|
||||
vision and audio). This is handled by objects called processors, which group tokenizers (for the text modality) and
|
||||
feature extractors (for vision and audio).
|
||||
|
||||
Those processors inherit from the following base class that implements the saving and loading functionality:
|
||||
|
||||
[[autodoc]] ProcessorMixin
|
||||
|
||||
## Deprecated processors
|
||||
|
||||
All processors follow the same architecture which is that of the
|
||||
[`~data.processors.utils.DataProcessor`]. The processor returns a list of
|
||||
|
||||
Reference in New Issue
Block a user