Just import torch AdamW instead (#36177)
* Just import torch AdamW instead * Update docs too * Make AdamW undocumented * make fixup * Add a basic wrapper class * Add it back to the docs * Just remove AdamW entirely * Remove some AdamW references * Drop AdamW from the public init * make fix-copies * Cleanup some references * make fixup * Delete lots of transformers.AdamW references * Remove extra references to adamw_hf
This commit is contained in:
@@ -22,9 +22,6 @@ The `.optimization` module provides:
|
||||
- several schedules in the form of schedule objects that inherit from `_LRSchedule`:
|
||||
- a gradient accumulation class to accumulate the gradients of multiple batches
|
||||
|
||||
## AdamW (PyTorch)
|
||||
|
||||
[[autodoc]] AdamW
|
||||
|
||||
## AdaFactor (PyTorch)
|
||||
|
||||
|
||||
Reference in New Issue
Block a user