* typo

* style
This commit is contained in:
Stas Bekman
2021-04-08 19:47:31 -07:00
committed by GitHub
parent 269c9638df
commit 0311ba2153

View File

@@ -355,9 +355,9 @@ Notes:
able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to
significantly shorter training time. significantly shorter training time.
3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp zero_dp_3` 3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp
to the command line arguments, and make sure you have added the distributed launcher ``-m torch.distributed.launch zero_dp_3`` to the command line arguments, and make sure you have added the distributed launcher ``-m
--nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already. torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already.
For example here is how you could use it for ``run_translation.py`` with 2 GPUs: For example here is how you could use it for ``run_translation.py`` with 2 GPUs: