[Trainer] implement gradient_accumulation_steps support in DeepSpeed integration (#10310)

* implement gradient_accumulation_steps support in DeepSpeed integration * typo * cleanup * cleanup
2021-02-22 11:15:59 -08:00
parent f991daed18
commit eab0afc19c
5 changed files with 162 additions and 27 deletions
--- a/docs/source/main_classes/trainer.rst
+++ b/docs/source/main_classes/trainer.rst
@@ -830,6 +830,28 @@ Here is an example of the ``amp`` configuration:
    }


+Gradient Accumulation
+=======================================================================================================================
+
+While normally DeepSpeed gets gradient accumulation configured with:
+
+.. code-block:: json
+
+    {
+        "gradient_accumulation_steps": 3,
+    }
+
+in this case, to enable gradient accumulation, pass the command line `--gradient_accumulation_steps` argument as normal
+and it will get injected into the DeepSpeed configuration.
+
+If you try to add it directly to the configuration file, you will receive an error from the Trainer - this is because
+this setting is needed by the Trainer too, and so this approach ensures that there is a single way of setting this
+value and thus avoid potential subtle errors.
+
+
+
+
+

 Gradient Clipping
 =======================================================================================================================