Trainer - deprecate tokenizer for processing_class (#32385)
* Trainer - deprecate tokenizer for processing_class * Extend chage across Seq2Seq trainer and docs * Add tests * Update to FutureWarning and add deprecation version
This commit is contained in:
@@ -81,7 +81,7 @@ trainer = Trainer(
|
||||
args=training_args,
|
||||
train_dataset=dataset["train"],
|
||||
eval_dataset=dataset["test"],
|
||||
tokenizer=tokenizer,
|
||||
processing_class=tokenizer,
|
||||
data_collator=data_collator,
|
||||
compute_metrics=compute_metrics,
|
||||
)
|
||||
@@ -153,7 +153,7 @@ from transformers import TrainerCallback
|
||||
class EarlyStoppingCallback(TrainerCallback):
|
||||
def __init__(self, num_steps=10):
|
||||
self.num_steps = num_steps
|
||||
|
||||
|
||||
def on_step_end(self, args, state, control, **kwargs):
|
||||
if state.global_step >= self.num_steps:
|
||||
return {"should_training_stop": True}
|
||||
@@ -171,7 +171,7 @@ trainer = Trainer(
|
||||
args=training_args,
|
||||
train_dataset=dataset["train"],
|
||||
eval_dataset=dataset["test"],
|
||||
tokenizer=tokenizer,
|
||||
processing_class=tokenizer,
|
||||
data_collator=data_collator,
|
||||
compute_metrics=compute_metrics,
|
||||
callback=[EarlyStoppingCallback()],
|
||||
@@ -289,7 +289,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_config(config).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model,
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
dataset_text_field='text',
|
||||
@@ -327,7 +327,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_config(config).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model,
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
dataset_text_field='text',
|
||||
@@ -370,7 +370,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_config(config).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model,
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
dataset_text_field='text',
|
||||
@@ -419,8 +419,8 @@ The kernel supports the Llama, Gemma, Mistral, and Mixtral model architectures.
|
||||
|
||||
## LOMO optimizer
|
||||
|
||||
The LOMO optimizers have been introduced in [Full Parameter Fine-Tuning for Large Language Models with Limited Resources](https://hf.co/papers/2306.09782) and [AdaLomo: Low-memory Optimization with Adaptive Learning Rate](https://hf.co/papers/2310.10195).
|
||||
They both consist of an efficient full-parameter fine-tuning method. These optimizers fuse the gradient computation and the parameter update in one step to reduce memory usage. Supported optimizers for LOMO are `"lomo"` and `"adalomo"`. First either install LOMO from pypi `pip install lomo-optim` or install it from source with `pip install git+https://github.com/OpenLMLab/LOMO.git`.
|
||||
The LOMO optimizers have been introduced in [Full Parameter Fine-Tuning for Large Language Models with Limited Resources](https://hf.co/papers/2306.09782) and [AdaLomo: Low-memory Optimization with Adaptive Learning Rate](https://hf.co/papers/2310.10195).
|
||||
They both consist of an efficient full-parameter fine-tuning method. These optimizers fuse the gradient computation and the parameter update in one step to reduce memory usage. Supported optimizers for LOMO are `"lomo"` and `"adalomo"`. First either install LOMO from pypi `pip install lomo-optim` or install it from source with `pip install git+https://github.com/OpenLMLab/LOMO.git`.
|
||||
|
||||
<Tip>
|
||||
|
||||
@@ -457,7 +457,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_id)
|
||||
model = AutoModelForCausalLM.from_pretrained(model_id, low_cpu_mem_usage=True).to(0)
|
||||
|
||||
trainer = trl.SFTTrainer(
|
||||
model=model,
|
||||
model=model,
|
||||
args=args,
|
||||
train_dataset=train_dataset,
|
||||
dataset_text_field='text',
|
||||
@@ -579,8 +579,8 @@ To use Accelerate with [`Trainer`], run the [`accelerate.config`](https://huggin
|
||||
<hfoption id="DistributedDataParallel">
|
||||
|
||||
```yml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
distributed_type: MULTI_GPU
|
||||
compute_environment: LOCAL_MACHINE
|
||||
distributed_type: MULTI_GPU
|
||||
downcast_bf16: 'no'
|
||||
gpu_ids: all
|
||||
machine_rank: 0 #change rank as per the node
|
||||
@@ -654,8 +654,8 @@ use_cpu: false
|
||||
<hfoption id="DeepSpeed with Accelerate plugin">
|
||||
|
||||
```yml
|
||||
compute_environment: LOCAL_MACHINE
|
||||
deepspeed_config:
|
||||
compute_environment: LOCAL_MACHINE
|
||||
deepspeed_config:
|
||||
gradient_accumulation_steps: 1
|
||||
gradient_clipping: 0.7
|
||||
offload_optimizer_device: cpu
|
||||
|
||||
Reference in New Issue
Block a user