Tokenizer kwargs in textgeneration pipe (#28362)

* added args to the pipeline

* added test

* more sensical tests

* fixup

* docs

* typo
;

* docs

* made changes to support named args

* fixed test

* docs update

* styles

* docs

* docs
This commit is contained in:
thedamnedrhino
2024-01-15 07:52:18 -08:00
committed by GitHub
parent a573ac74fd
commit 366c03271e
3 changed files with 49 additions and 3 deletions

View File

@@ -216,6 +216,12 @@ array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
</tf>
</frameworkcontent>
<Tip>
Different pipelines support tokenizer arguments in their `__call__()` differently. `text-2-text-generation` pipelines support (i.e. pass on)
only `truncation`. `text-generation` pipelines support `max_length`, `truncation`, `padding` and `add_special_tokens`.
In `fill-mask` pipelines, tokenizer arguments can be passed in the `tokenizer_kwargs` argument (dictionary).
</Tip>
## Audio
For audio tasks, you'll need a [feature extractor](main_classes/feature_extractor) to prepare your dataset for the model. The feature extractor is designed to extract features from raw audio data, and convert them into tensors.