Add new quant method (#32047)

* Add new quant method

* update

* fix multi-device

* add test

* add offload

* style

* style

* add simple example

* initial doc

* docstring

* style again

* works ?

* better docs

* switch to non persistant

* remove print

* fix init

* code review
This commit is contained in:
Marc Sun
2024-07-22 20:21:59 +02:00
committed by GitHub
parent bd9dca3b85
commit 96a074fa7e
16 changed files with 770 additions and 5 deletions

View File

@@ -56,3 +56,8 @@ Learn how to quantize models in the [Quantization](../quantization) guide.
## HqqConfig
[[autodoc]] HqqConfig
## FbgemmFp8Config
[[autodoc]] FbgemmFp8Config