docs: replace torch.distributed.run by torchrun (#27528)

* docs: replace torch.distributed.run by torchrun `transformers` now officially support pytorch >= 1.10. The entrypoint `torchrun`` is present from 1.10 onwards. Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> * Update src/transformers/trainer.py with @ArthurZucker's suggestion Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Signed-off-by: Peter Pan <Peter.Pan@daocloud.io> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-28 00:26:33 +08:00
parent c832bcb812
commit ce31508134
25 changed files with 46 additions and 46 deletions
--- a/examples/pytorch/README.md
+++ b/examples/pytorch/README.md
@@ -98,7 +98,7 @@ the [Trainer API](https://huggingface.co/transformers/main_classes/trainer.html)
 use the following command:

 ```bash
-python -m torch.distributed.launch \
+torchrun \
    --nproc_per_node number_of_gpu_you_have path_to_script.py \
 	--all_arguments_of_the_script
 ```
@@ -107,7 +107,7 @@ As an example, here is how you would fine-tune the BERT large model (with whole
 classification MNLI task using the `run_glue` script, with 8 GPUs:

 ```bash
-python -m torch.distributed.launch \
+torchrun \
    --nproc_per_node 8 pytorch/text-classification/run_glue.py \
    --model_name_or_path bert-large-uncased-whole-word-masking \
    --task_name mnli \
--- a/examples/pytorch/speech-recognition/README.md
+++ b/examples/pytorch/speech-recognition/README.md
@@ -100,7 +100,7 @@ of **0.35**.
 The following command shows how to fine-tune [XLSR-Wav2Vec2](https://huggingface.co/transformers/main/model_doc/xlsr_wav2vec2.html) on [Common Voice](https://huggingface.co/datasets/common_voice) using 8 GPUs in half-precision.

 ```bash
-python -m torch.distributed.launch \
+torchrun \
 	--nproc_per_node 8 run_speech_recognition_ctc.py \
 	--dataset_name="common_voice" \
 	--model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
@@ -147,7 +147,7 @@ However, the `--shuffle_buffer_size` argument controls how many examples we can


 ```bash
-**python -m torch.distributed.launch \
+**torchrun \
 	--nproc_per_node 4 run_speech_recognition_ctc_streaming.py \
 	--dataset_name="common_voice" \
 	--model_name_or_path="facebook/wav2vec2-xls-r-300m" \
@@ -404,7 +404,7 @@ If training on a different language, you should be sure to change the `language`
 #### Multi GPU Whisper Training
 The following example shows how to fine-tune the [Whisper small](https://huggingface.co/openai/whisper-small) checkpoint on the Hindi subset of [Common Voice 11](https://huggingface.co/datasets/mozilla-foundation/common_voice_11_0) using 2 GPU devices in half-precision:
 ```bash
-python -m torch.distributed.launch \
+torchrun \
 	--nproc_per_node 2 run_speech_recognition_seq2seq.py \
 	--model_name_or_path="openai/whisper-small" \
 	--dataset_name="mozilla-foundation/common_voice_11_0" \
@@ -572,7 +572,7 @@ cross-entropy loss of **0.405** and word error rate of **0.0728**.
 The following command shows how to fine-tune [XLSR-Wav2Vec2](https://huggingface.co/transformers/main/model_doc/xlsr_wav2vec2.html) on [Common Voice](https://huggingface.co/datasets/common_voice) using 8 GPUs in half-precision.

 ```bash
-python -m torch.distributed.launch \
+torchrun \
 	--nproc_per_node 8 run_speech_recognition_seq2seq.py \
 	--dataset_name="librispeech_asr" \
 	--model_name_or_path="./" \