diff --git a/docs/source/main_classes/deepspeed.rst b/docs/source/main_classes/deepspeed.rst index 2769ca044a..18463bd3f1 100644 --- a/docs/source/main_classes/deepspeed.rst +++ b/docs/source/main_classes/deepspeed.rst @@ -1156,8 +1156,8 @@ Here is where the schedulers overlap between 🤗 Transformers and DeepSpeed: therefore, if you don't configure the scheduler this is scheduler that will get configured by default. If you don't configure the ``scheduler`` entry in the configuration file, the :class:`~transformers.Trainer` will use -the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` to configure a 🤗 Transformers version -of it. +the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` or ``--warmup_ratio`` to configure a +🤗 Transformers version of it. Here is an example of the auto-configured ``scheduler`` entry for ``WarmupLR``: @@ -1178,9 +1178,10 @@ Since `"auto"` is used the :class:`~transformers.Trainer` arguments will set the file. This is so that there is one definitive source of the values and to avoid hard to find errors when, for example, the learning rate is set to different values in different places. Command line rules. The values that get set are: -- ``warmup_min_lr`` with the value of ``0`` -- ``warmup_max_lr`` with the value of ``--learning_rate`` -- ``warmup_num_steps`` with the value of ``--warmup_steps`` +- ``warmup_min_lr`` with the value of ``0``. +- ``warmup_max_lr`` with the value of ``--learning_rate``. +- ``warmup_num_steps`` with the value of ``--warmup_steps`` if provided. Otherwise will use ``--warmup_ratio`` + multiplied by the number of training steps and rounded up. - ``total_num_steps`` with either the value of ``--max_steps`` or if it is not provided, derived automatically at run time based on the environment and the size of the dataset and other command line arguments (needed for ``WarmupDecayLR``).