Update docs around mixing hf scheduler with deepspeed optimizer (#28223)
update docs around mixing hf scheduler with deepspeed optimizer
This commit is contained in:
@@ -1221,12 +1221,7 @@ Therefore you have two ways to take advantage of this very beneficial feature:
|
|||||||
### Optimizer and Scheduler
|
### Optimizer and Scheduler
|
||||||
|
|
||||||
As long as you don't enable `offload_optimizer` you can mix and match DeepSpeed and HuggingFace schedulers and
|
As long as you don't enable `offload_optimizer` you can mix and match DeepSpeed and HuggingFace schedulers and
|
||||||
optimizers, with the exception of using the combination of HuggingFace scheduler and DeepSpeed optimizer:
|
optimizers.
|
||||||
|
|
||||||
| Combos | HF Scheduler | DS Scheduler |
|
|
||||||
|:-------------|:-------------|:-------------|
|
|
||||||
| HF Optimizer | Yes | Yes |
|
|
||||||
| DS Optimizer | No | Yes |
|
|
||||||
|
|
||||||
It is possible to use a non-DeepSpeed optimizer when `offload_optimizer` is enabled, as long as it has both CPU and
|
It is possible to use a non-DeepSpeed optimizer when `offload_optimizer` is enabled, as long as it has both CPU and
|
||||||
GPU implementation (except LAMB).
|
GPU implementation (except LAMB).
|
||||||
|
|||||||
@@ -275,14 +275,7 @@ def deepspeed_optim_sched(trainer, hf_deepspeed_config, args, num_training_steps
|
|||||||
|
|
||||||
config = hf_deepspeed_config.config
|
config = hf_deepspeed_config.config
|
||||||
|
|
||||||
# Optimizer + Scheduler
|
# Mixing and matching DS schedulers and optimizers is supported unless Offload is enabled in which case it's:
|
||||||
# Currently supported combos:
|
|
||||||
# 1. DS scheduler + DS optimizer: Yes
|
|
||||||
# 2. HF scheduler + HF optimizer: Yes
|
|
||||||
# 3. DS scheduler + HF optimizer: Yes
|
|
||||||
# 4. HF scheduler + DS optimizer: No
|
|
||||||
#
|
|
||||||
# Unless Offload is enabled in which case it's:
|
|
||||||
# 1. DS scheduler + DS optimizer: Yes
|
# 1. DS scheduler + DS optimizer: Yes
|
||||||
# 2. HF scheduler + HF optimizer: Mostly*
|
# 2. HF scheduler + HF optimizer: Mostly*
|
||||||
# 3. DS scheduler + HF optimizer: Mostly*
|
# 3. DS scheduler + HF optimizer: Mostly*
|
||||||
|
|||||||
Reference in New Issue
Block a user