[Deepspeed] warmup_ratio docs (#12830)
* [Deepspeed] warmup_ratio docs * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -1156,8 +1156,8 @@ Here is where the schedulers overlap between 🤗 Transformers and DeepSpeed:
|
||||
therefore, if you don't configure the scheduler this is scheduler that will get configured by default.
|
||||
|
||||
If you don't configure the ``scheduler`` entry in the configuration file, the :class:`~transformers.Trainer` will use
|
||||
the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` to configure a 🤗 Transformers version
|
||||
of it.
|
||||
the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` or ``--warmup_ratio`` to configure a
|
||||
🤗 Transformers version of it.
|
||||
|
||||
Here is an example of the auto-configured ``scheduler`` entry for ``WarmupLR``:
|
||||
|
||||
@@ -1178,9 +1178,10 @@ Since `"auto"` is used the :class:`~transformers.Trainer` arguments will set the
|
||||
file. This is so that there is one definitive source of the values and to avoid hard to find errors when, for example,
|
||||
the learning rate is set to different values in different places. Command line rules. The values that get set are:
|
||||
|
||||
- ``warmup_min_lr`` with the value of ``0``
|
||||
- ``warmup_max_lr`` with the value of ``--learning_rate``
|
||||
- ``warmup_num_steps`` with the value of ``--warmup_steps``
|
||||
- ``warmup_min_lr`` with the value of ``0``.
|
||||
- ``warmup_max_lr`` with the value of ``--learning_rate``.
|
||||
- ``warmup_num_steps`` with the value of ``--warmup_steps`` if provided. Otherwise will use ``--warmup_ratio``
|
||||
multiplied by the number of training steps and rounded up.
|
||||
- ``total_num_steps`` with either the value of ``--max_steps`` or if it is not provided, derived automatically at run
|
||||
time based on the environment and the size of the dataset and other command line arguments (needed for
|
||||
``WarmupDecayLR``).
|
||||
|
||||
Reference in New Issue
Block a user