From 0311ba21534e3f21ac9b9327009e669e34c9b367 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Thu, 8 Apr 2021 19:47:31 -0700 Subject: [PATCH] typo (#11152) * typo * style --- docs/source/main_classes/trainer.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/main_classes/trainer.rst b/docs/source/main_classes/trainer.rst index 10a7a9d54a..aae325076c 100644 --- a/docs/source/main_classes/trainer.rst +++ b/docs/source/main_classes/trainer.rst @@ -355,9 +355,9 @@ Notes: able to use significantly larger batch sizes using the same hardware (e.g. 3x and even bigger) which should lead to significantly shorter training time. -3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp zero_dp_3` - to the command line arguments, and make sure you have added the distributed launcher ``-m torch.distributed.launch - --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already. +3. To use the second version of Sharded data-parallelism, add ``--sharded_ddp zero_dp_2`` or ``--sharded_ddp + zero_dp_3`` to the command line arguments, and make sure you have added the distributed launcher ``-m + torch.distributed.launch --nproc_per_node=NUMBER_OF_GPUS_YOU_HAVE`` if you haven't been using it already. For example here is how you could use it for ``run_translation.py`` with 2 GPUs: