[Deepspeed] warmup_ratio docs (#12830)

* [Deepspeed] warmup_ratio docs

* Update docs/source/main_classes/deepspeed.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

* Update docs/source/main_classes/deepspeed.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
Stas Bekman
2021-07-21 10:49:29 -07:00
committed by GitHub
parent 8c2384d8e2
commit 807b6bd160

View File

@@ -1156,8 +1156,8 @@ Here is where the schedulers overlap between 🤗 Transformers and DeepSpeed:
therefore, if you don't configure the scheduler this is scheduler that will get configured by default.
If you don't configure the ``scheduler`` entry in the configuration file, the :class:`~transformers.Trainer` will use
the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` to configure a 🤗 Transformers version
of it.
the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` or ``--warmup_ratio`` to configure a
🤗 Transformers version of it.
Here is an example of the auto-configured ``scheduler`` entry for ``WarmupLR``:
@@ -1178,9 +1178,10 @@ Since `"auto"` is used the :class:`~transformers.Trainer` arguments will set the
file. This is so that there is one definitive source of the values and to avoid hard to find errors when, for example,
the learning rate is set to different values in different places. Command line rules. The values that get set are:
- ``warmup_min_lr`` with the value of ``0``
- ``warmup_max_lr`` with the value of ``--learning_rate``
- ``warmup_num_steps`` with the value of ``--warmup_steps``
- ``warmup_min_lr`` with the value of ``0``.
- ``warmup_max_lr`` with the value of ``--learning_rate``.
- ``warmup_num_steps`` with the value of ``--warmup_steps`` if provided. Otherwise will use ``--warmup_ratio``
multiplied by the number of training steps and rounded up.
- ``total_num_steps`` with either the value of ``--max_steps`` or if it is not provided, derived automatically at run
time based on the environment and the size of the dataset and other command line arguments (needed for
``WarmupDecayLR``).