Trainer - deprecate tokenizer for processing_class (#32385)
* Trainer - deprecate tokenizer for processing_class * Extend chage across Seq2Seq trainer and docs * Add tests * Update to FutureWarning and add deprecation version
This commit is contained in:
@@ -111,7 +111,7 @@ Load an audio dataset (see the 🤗 Datasets [Quick Start](https://huggingface.c
|
||||
>>> dataset = load_dataset("PolyAI/minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT
|
||||
```
|
||||
|
||||
You need to make sure the sampling rate of the dataset matches the sampling
|
||||
You need to make sure the sampling rate of the dataset matches the sampling
|
||||
rate [`facebook/wav2vec2-base-960h`](https://huggingface.co/facebook/wav2vec2-base-960h) was trained on:
|
||||
|
||||
```py
|
||||
@@ -174,7 +174,7 @@ If you can't find a model for your use-case, you'll need to finetune a pretraine
|
||||
|
||||
<Youtube id="AhChOFRegn4"/>
|
||||
|
||||
Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class.
|
||||
Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`] you used above. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from its name or path. You only need to select the appropriate `AutoClass` for your task and it's associated preprocessing class.
|
||||
|
||||
Let's return to the example from the previous section and see how you can use the `AutoClass` to replicate the results of the [`pipeline`].
|
||||
|
||||
@@ -485,7 +485,7 @@ Now gather all these classes in [`Trainer`]:
|
||||
... args=training_args,
|
||||
... train_dataset=dataset["train"],
|
||||
... eval_dataset=dataset["test"],
|
||||
... tokenizer=tokenizer,
|
||||
... processing_class=tokenizer,
|
||||
... data_collator=data_collator,
|
||||
... ) # doctest: +SKIP
|
||||
```
|
||||
@@ -502,7 +502,7 @@ For tasks - like translation or summarization - that use a sequence-to-sequence
|
||||
|
||||
</Tip>
|
||||
|
||||
You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed.
|
||||
You can customize the training loop behavior by subclassing the methods inside [`Trainer`]. This allows you to customize features such as the loss function, optimizer, and scheduler. Take a look at the [`Trainer`] reference for which methods can be subclassed.
|
||||
|
||||
The other way to customize the training loop is by using [Callbacks](./main_classes/callback). You can use callbacks to integrate with other libraries and inspect the training loop to report on progress or stop the training early. Callbacks do not modify anything in the training loop itself. To customize something like the loss function, you need to subclass the [`Trainer`] instead.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user