Just import torch AdamW instead (#36177)

* Just import torch AdamW instead

* Update docs too

* Make AdamW undocumented

* make fixup

* Add a basic wrapper class

* Add it back to the docs

* Just remove AdamW entirely

* Remove some AdamW references

* Drop AdamW from the public init

* make fix-copies

* Cleanup some references

* make fixup

* Delete lots of transformers.AdamW references

* Remove extra references to adamw_hf
This commit is contained in:
Matt
2025-03-19 18:29:40 +00:00
committed by GitHub
parent 51bd0ceb9e
commit 9be4728af8
18 changed files with 18 additions and 174 deletions

View File

@@ -22,9 +22,6 @@ The `.optimization` module provides:
- several schedules in the form of schedule objects that inherit from `_LRSchedule`:
- a gradient accumulation class to accumulate the gradients of multiple batches
## AdamW (PyTorch)
[[autodoc]] AdamW
## AdaFactor (PyTorch)

View File

@@ -22,10 +22,6 @@ rendered properly in your Markdown viewer.
- `_LRSchedule` から継承するスケジュール オブジェクトの形式のいくつかのスケジュール:
- 複数のバッチの勾配を累積するための勾配累積クラス
## AdamW (PyTorch)
[[autodoc]] AdamW
## AdaFactor (PyTorch)
[[autodoc]] Adafactor

View File

@@ -22,10 +22,6 @@ rendered properly in your Markdown viewer.
- 继承自 `_LRSchedule` 多个调度器:
- 一个梯度累积类,用于累积多个批次的梯度
## AdamW (PyTorch)
[[autodoc]] AdamW
## AdaFactor (PyTorch)
[[autodoc]] Adafactor