Add TorchAOHfQuantizer (#32306)

* Add TorchAOHfQuantizer

Summary:
Enable loading torchao quantized model in huggingface.

Test Plan:
local test

Reviewers:

Subscribers:

Tasks:

Tags:

* Fix a few issues

* style

* Added tests and addressed some comments about dtype conversion

* fix torch_dtype warning message

* fix tests

* style

* TorchAOConfig -> TorchAoConfig

* enable offload + fix memory with multi-gpu

* update torchao version requirement to 0.4.0

* better comments

* add torch.compile to torchao README, add perf number link

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
This commit is contained in:
Jerry Zhang
2024-08-14 07:14:24 -07:00
committed by GitHub
parent 9485289f37
commit 78d78cdf8a
13 changed files with 539 additions and 3 deletions

View File

@@ -56,4 +56,4 @@ Use the table below to help you decide which quantization method to use.
| [HQQ](./hqq) | 🟢 | 🟢 | 🟢 | 🔴 | 🔴 | 🟢 | 1 - 8 | 🟢 | 🔴 | 🟢 | https://github.com/mobiusml/hqq/ |
| [Quanto](./quanto) | 🟢 | 🟢 | 🟢 | 🔴 | 🟢 | 🟢 | 2 / 4 / 8 | 🔴 | 🔴 | 🟢 | https://github.com/huggingface/quanto |
| [FBGEMM_FP8](./fbgemm_fp8.md) | 🟢 | 🔴 | 🟢 | 🔴 | 🔴 | 🔴 | 8 | 🔴 | 🟢 | 🟢 | https://github.com/pytorch/FBGEMM |
| [torchao](./torchao.md) | 🟢 | | 🟢 | 🔴 | partial support (int4 weight only) | | 4 / 8 | | 🟢🔴 | 🟢 | https://github.com/pytorch/ao |