deprecate sharded_ddp training argument (#24825)

* deprecate fairscale's ShardedDDP * fix code style * roll back * deprecate the `sharded_ddp` training argument --------- Co-authored-by: jihuazhong <jihuazhong1@huawei.com>
2023-07-17 18:57:42 +08:00
parent 5bb4430edc
commit 8ba26c18cf
4 changed files with 11 additions and 148 deletions
--- a/docs/source/en/perf_train_gpu_many.md
+++ b/docs/source/en/perf_train_gpu_many.md
@@ -241,7 +241,6 @@ If you pay close attention the way ZeRO partitions the model's weights - it look
 Implementations:

 - [DeepSpeed](https://www.deepspeed.ai/features/#the-zero-redundancy-optimizer) ZeRO-DP stages 1+2+3
- [Fairscale](https://github.com/facebookresearch/fairscale/#optimizer-state-sharding-zero) ZeRO-DP stages 1+2+3
 - [`transformers` integration](main_classes/trainer#trainer-integrations)

 ## Naive Model Parallelism (Vertical) and Pipeline Parallelism
@@ -294,7 +293,6 @@ There are 2 groups of solutions - the traditional Pipeline API and the more mode

 Traditional Pipeline API solutions:
 - PyTorch
- FairScale
 - DeepSpeed
 - Megatron-LM

@@ -312,7 +310,6 @@ We are yet to experiment with Varuna and SageMaker but their papers report that

 Implementations:
 - [Pytorch](https://pytorch.org/docs/stable/pipeline.html) (initial support in pytorch-1.8, and progressively getting improved in 1.9 and more so in 1.10). Some [examples](https://github.com/pytorch/pytorch/blob/master/benchmarks/distributed/pipeline/pipe.py)
- [FairScale](https://fairscale.readthedocs.io/en/latest/tutorials/pipe.html)
 - [DeepSpeed](https://www.deepspeed.ai/tutorials/pipeline/)
 - [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) has an internal implementation - no API.
 - [Varuna](https://github.com/microsoft/varuna)