From 47cccb53182f2a5722504895c104ac9a163afeb1 Mon Sep 17 00:00:00 2001 From: Stas Bekman Date: Thu, 17 Mar 2022 13:33:55 -0700 Subject: [PATCH] [Deepspeed] non-HF Trainer doc update (#16238) --- docs/source/main_classes/deepspeed.mdx | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/source/main_classes/deepspeed.mdx b/docs/source/main_classes/deepspeed.mdx index 863cab408c..3d10af0e19 100644 --- a/docs/source/main_classes/deepspeed.mdx +++ b/docs/source/main_classes/deepspeed.mdx @@ -1854,12 +1854,14 @@ In this case you usually need to raise the value of `initial_scale_power`. Setti ## Non-Trainer Deepspeed Integration The [`~deepspeed.HfDeepSpeedConfig`] is used to integrate Deepspeed into the 🤗 Transformers core -functionality, when [`Trainer`] is not used. The only thing that it does is handling Deepspeed ZeRO 3 param gathering and automatically splitting the model onto multiple gpus during `from_pretrained` call. Everything else you have to do by yourself. +functionality, when [`Trainer`] is not used. The only thing that it does is handling Deepspeed ZeRO-3 param gathering and automatically splitting the model onto multiple gpus during `from_pretrained` call. Everything else you have to do by yourself. When using [`Trainer`] everything is automatically taken care of. -When not using [`Trainer`], to efficiently deploy DeepSpeed stage 3, you must instantiate the -[`~deepspeed.HfDeepSpeedConfig`] object before instantiating the model. +When not using [`Trainer`], to efficiently deploy DeepSpeed ZeRO-3, you must instantiate the +[`~deepspeed.HfDeepSpeedConfig`] object before instantiating the model and keep that object alive. + +If you're using Deepspeed ZeRO-1 or ZeRO-2 you don't need to use `HfDeepSpeedConfig` at all. For example for a pretrained model: