docs: replace torch.distributed.run by torchrun (#27528)
* docs: replace torch.distributed.run by torchrun `transformers` now officially support pytorch >= 1.10. The entrypoint `torchrun`` is present from 1.10 onwards. Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> * Update src/transformers/trainer.py with @ArthurZucker's suggestion Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
@@ -140,7 +140,7 @@ python finetune_trainer.py --help
|
||||
|
||||
For multi-gpu training use `torch.distributed.launch`, e.g. with 2 gpus:
|
||||
```bash
|
||||
python -m torch.distributed.launch --nproc_per_node=2 finetune_trainer.py ...
|
||||
torchrun --nproc_per_node=2 finetune_trainer.py ...
|
||||
```
|
||||
|
||||
**At the moment, `Seq2SeqTrainer` does not support *with teacher* distillation.**
|
||||
@@ -214,7 +214,7 @@ because it uses SortishSampler to minimize padding. You can also use it on 1 GPU
|
||||
`{type_path}.source` and `{type_path}.target`. Run `./run_distributed_eval.py --help` for all clargs.
|
||||
|
||||
```bash
|
||||
python -m torch.distributed.launch --nproc_per_node=8 run_distributed_eval.py \
|
||||
torchrun --nproc_per_node=8 run_distributed_eval.py \
|
||||
--model_name sshleifer/distilbart-large-xsum-12-3 \
|
||||
--save_dir xsum_generations \
|
||||
--data_dir xsum \
|
||||
|
||||
Reference in New Issue
Block a user