[Deepspeed] warmup_ratio docs (#12830)
* [Deepspeed] warmup_ratio docs * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style * Update docs/source/main_classes/deepspeed.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * style Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -1156,8 +1156,8 @@ Here is where the schedulers overlap between 🤗 Transformers and DeepSpeed:
|
|||||||
therefore, if you don't configure the scheduler this is scheduler that will get configured by default.
|
therefore, if you don't configure the scheduler this is scheduler that will get configured by default.
|
||||||
|
|
||||||
If you don't configure the ``scheduler`` entry in the configuration file, the :class:`~transformers.Trainer` will use
|
If you don't configure the ``scheduler`` entry in the configuration file, the :class:`~transformers.Trainer` will use
|
||||||
the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` to configure a 🤗 Transformers version
|
the values of ``--lr_scheduler_type``, ``--learning_rate`` and ``--warmup_steps`` or ``--warmup_ratio`` to configure a
|
||||||
of it.
|
🤗 Transformers version of it.
|
||||||
|
|
||||||
Here is an example of the auto-configured ``scheduler`` entry for ``WarmupLR``:
|
Here is an example of the auto-configured ``scheduler`` entry for ``WarmupLR``:
|
||||||
|
|
||||||
@@ -1178,9 +1178,10 @@ Since `"auto"` is used the :class:`~transformers.Trainer` arguments will set the
|
|||||||
file. This is so that there is one definitive source of the values and to avoid hard to find errors when, for example,
|
file. This is so that there is one definitive source of the values and to avoid hard to find errors when, for example,
|
||||||
the learning rate is set to different values in different places. Command line rules. The values that get set are:
|
the learning rate is set to different values in different places. Command line rules. The values that get set are:
|
||||||
|
|
||||||
- ``warmup_min_lr`` with the value of ``0``
|
- ``warmup_min_lr`` with the value of ``0``.
|
||||||
- ``warmup_max_lr`` with the value of ``--learning_rate``
|
- ``warmup_max_lr`` with the value of ``--learning_rate``.
|
||||||
- ``warmup_num_steps`` with the value of ``--warmup_steps``
|
- ``warmup_num_steps`` with the value of ``--warmup_steps`` if provided. Otherwise will use ``--warmup_ratio``
|
||||||
|
multiplied by the number of training steps and rounded up.
|
||||||
- ``total_num_steps`` with either the value of ``--max_steps`` or if it is not provided, derived automatically at run
|
- ``total_num_steps`` with either the value of ``--max_steps`` or if it is not provided, derived automatically at run
|
||||||
time based on the environment and the size of the dataset and other command line arguments (needed for
|
time based on the environment and the size of the dataset and other command line arguments (needed for
|
||||||
``WarmupDecayLR``).
|
``WarmupDecayLR``).
|
||||||
|
|||||||
Reference in New Issue
Block a user