* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add support for constant learning rate with cooldown
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and cooldown methods to 'get_wsc_schedule'
* Add more warmup and decay methods to 'get_wsd_schedule'
* support num_training_steps and num_stable_steps for get_wsd_schedule
* support num_training_steps and num_stable_steps for get_wsd_schedule
* get wsd scheduler before the `num_training_steps` decision
* fix code_quality
* Update stable branch logic
* fix code_quality
* Move stable stage decide to `get_wsd_schedule`
* Update docstring of `get_wsd_schedule`
* Update `num_train_steps` to optional
* Update `num_train_steps` to optional
* Update docstring of `get_wsd_schedule`
* Update src/transformers/optimization.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>