Trainer - deprecate tokenizer for processing_class (#32385)

* Trainer - deprecate tokenizer for processing_class

* Extend chage across Seq2Seq trainer and docs

* Add tests

* Update to FutureWarning and add deprecation version
This commit is contained in:
amyeroberts
2024-10-02 14:08:46 +01:00
committed by GitHub
parent e7c8af7f33
commit b7474f211d
99 changed files with 569 additions and 442 deletions

View File

@@ -39,8 +39,8 @@ The original code can be found [here](https://github.com/state-spaces/mamba).
# Usage
### A simple generation example:
```python
### A simple generation example:
```python
from transformers import MambaConfig, MambaForCausalLM, AutoTokenizer
import torch
@@ -55,7 +55,7 @@ print(tokenizer.batch_decode(out))
### Peft finetuning
The slow version is not very stable for training, and the fast one needs `float32`!
```python
```python
from datasets import load_dataset
from trl import SFTTrainer
from peft import LoraConfig
@@ -80,7 +80,7 @@ lora_config = LoraConfig(
)
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
processing_class=tokenizer,
args=training_args,
peft_config=lora_config,
train_dataset=dataset,