docs: replace torch.distributed.run by torchrun (#27528)

* docs: replace torch.distributed.run by torchrun

 `transformers` now officially support pytorch >= 1.10.
 The entrypoint `torchrun`` is present from 1.10 onwards.

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>

* Update src/transformers/trainer.py

with @ArthurZucker's suggestion

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
Peter Pan
2023-11-28 00:26:33 +08:00
committed by GitHub
parent c832bcb812
commit ce31508134
25 changed files with 46 additions and 46 deletions

View File

@@ -206,7 +206,7 @@ Let's discuss how you can tell your program which GPUs are to be used and in wha
When using [`DistributedDataParallel`](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html) to use only a subset of your GPUs, you simply specify the number of GPUs to use. For example, if you have 4 GPUs, but you wish to use the first 2 you can do:
```bash
python -m torch.distributed.launch --nproc_per_node=2 trainer-program.py ...
torchrun --nproc_per_node=2 trainer-program.py ...
```
if you have either [`accelerate`](https://github.com/huggingface/accelerate) or [`deepspeed`](https://github.com/microsoft/DeepSpeed) installed you can also accomplish the same by using one of:
@@ -233,7 +233,7 @@ If you have multiple GPUs and you'd like to use only 1 or a few of those GPUs, s
For example, let's say you have 4 GPUs: 0, 1, 2 and 3. To run only on the physical GPUs 0 and 2, you can do:
```bash
CUDA_VISIBLE_DEVICES=0,2 python -m torch.distributed.launch trainer-program.py ...
CUDA_VISIBLE_DEVICES=0,2 torchrun trainer-program.py ...
```
So now pytorch will see only 2 GPUs, where your physical GPUs 0 and 2 are mapped to `cuda:0` and `cuda:1` correspondingly.
@@ -241,7 +241,7 @@ So now pytorch will see only 2 GPUs, where your physical GPUs 0 and 2 are mapped
You can even change their order:
```bash
CUDA_VISIBLE_DEVICES=2,0 python -m torch.distributed.launch trainer-program.py ...
CUDA_VISIBLE_DEVICES=2,0 torchrun trainer-program.py ...
```
Here your physical GPUs 0 and 2 are mapped to `cuda:1` and `cuda:0` correspondingly.
@@ -263,7 +263,7 @@ As with any environment variable you can, of course, export those instead of add
```bash
export CUDA_VISIBLE_DEVICES=0,2
python -m torch.distributed.launch trainer-program.py ...
torchrun trainer-program.py ...
```
but this approach can be confusing since you may forget you set up the environment variable earlier and not understand why the wrong GPUs are used. Therefore, it's a common practice to set the environment variable just for a specific run on the same command line as it's shown in most examples of this section.