From 29080643ebe4c2d0a593b185eaf81ce9f0dc1a3e Mon Sep 17 00:00:00 2001 From: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Date: Thu, 9 Jun 2022 12:20:39 -0400 Subject: [PATCH] Mention in the doc we drop support for fairscale (#17610) --- docs/source/en/main_classes/trainer.mdx | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/docs/source/en/main_classes/trainer.mdx b/docs/source/en/main_classes/trainer.mdx index 1163ba0623..e5807bd138 100644 --- a/docs/source/en/main_classes/trainer.mdx +++ b/docs/source/en/main_classes/trainer.mdx @@ -291,10 +291,10 @@ Also if you do set this environment variable it's the best to set it in your `~/ The [`Trainer`] has been extended to support libraries that may dramatically improve your training time and fit much bigger models. -Currently it supports third party solutions, [DeepSpeed](https://github.com/microsoft/DeepSpeed) and [FairScale](https://github.com/facebookresearch/fairscale/), which implement parts of the paper [ZeRO: Memory Optimizations +Currently it supports third party solutions, [DeepSpeed](https://github.com/microsoft/DeepSpeed), [PyTorch FSDP](https://pytorch.org/docs/stable/fsdp.html) and [FairScale](https://github.com/facebookresearch/fairscale/), which implement parts of the paper [ZeRO: Memory Optimizations Toward Training Trillion Parameter Models, by Samyam Rajbhandari, Jeff Rasley, Olatunji Ruwase, Yuxiong He](https://arxiv.org/abs/1910.02054). -This provided support is new and experimental as of this writing. +This provided support is new and experimental as of this writing. While the support for DeepSpeed and PyTorch FSDP is active and we welcome issues around it, we don't support the FairScale integration anymore since it has been integrated in PyTorch main (see the [PyTorch FSDP integration](#pytorch-fully-sharded-data-parallel)) @@ -408,6 +408,12 @@ As always make sure to edit the paths in the example to match your situation. ### FairScale + + +This integration is not supported anymore, we recommend you either use DeepSpeed or PyTorch FSDP. + + + By integrating [FairScale](https://github.com/facebookresearch/fairscale/) the [`Trainer`] provides support for the following features from [the ZeRO paper](https://arxiv.org/abs/1910.02054):