[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310)

* implement gradient_accumulation_steps support in DeepSpeed integration

* typo

* cleanup

* cleanup
This commit is contained in:
Stas Bekman
2021-02-22 11:15:59 -08:00
committed by GitHub
parent f991daed18
commit eab0afc19c
5 changed files with 162 additions and 27 deletions

View File

@@ -830,6 +830,28 @@ Here is an example of the ``amp`` configuration:
}
Gradient Accumulation
=======================================================================================================================
While normally DeepSpeed gets gradient accumulation configured with:
.. code-block:: json
{
"gradient_accumulation_steps": 3,
}
in this case, to enable gradient accumulation, pass the command line `--gradient_accumulation_steps` argument as normal
and it will get injected into the DeepSpeed configuration.
If you try to add it directly to the configuration file, you will receive an error from the Trainer - this is because
this setting is needed by the Trainer too, and so this approach ensures that there is a single way of setting this
value and thus avoid potential subtle errors.
Gradient Clipping
=======================================================================================================================