Add auto_wrap option in fairscale integration (#10673)

* Add auto_wrap option in fairscale integration * Style
2021-03-12 07:50:20 -05:00
parent 184ef8ecd0
commit e8246f78f9
4 changed files with 13 additions and 6 deletions
--- a/docs/source/main_classes/trainer.rst
+++ b/docs/source/main_classes/trainer.rst
@@ -335,8 +335,8 @@ Known caveats:

 - This feature is incompatible with :obj:`--predict_with_generate` in the `run_seq2seq.py` script.
 - Using :obj:`--sharded_ddp zero_dp_3` requires wrapping each layer of the model in the special container
-  :obj:`FullyShardedDataParallelism` of fairscale. This is not done automatically by any of the example scripts of the
-  :class:`~transformers.Trainer`.
+  :obj:`FullyShardedDataParallelism` of fairscale. It should be used with the option :obj:`auto_wrap` if you are not
+  doing this yourself: :obj:`--sharded_ddp "zero_dp_3 auto_wrap"`.


 DeepSpeed