Hqq serialization (#33141)

* HQQ model serialization attempt

* fix hqq dispatch and unexpected keys

* style

* remove check_old_param

* revert to check HQQLinear in quantizer_hqq.py

* revert to check HQQLinear in quantizer_hqq.py

* update HqqConfig default params

* make ci happy

* make ci happy

* revert to HQQLinear check in quantizer_hqq.py

* check hqq_min version 0.2.0

* set axis=1 as default in quantization_config.py

* validate_env with hqq>=0.2.0 version message

* deprecated hqq kwargs message

* make ci happy

* remove run_expected_keys_check hack + bump to 0.2.1 min hqq version

* fix unexpected_keys hqq update

* add pre_quantized check

* add update_expected_keys to base quantizerr

* ci base.py fix?

* ci base.py fix?

* fix "quantization typo" src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix post merge

---------

Co-authored-by: Marc Sun <marc@huggingface.co>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
mobicham
2024-09-30 14:47:18 +02:00
committed by GitHub
parent 4d5b458704
commit f5247aca01
8 changed files with 215 additions and 61 deletions

6
docs/source/en/quantization/hqq.md Normal file → Executable file
View File

@@ -30,13 +30,13 @@ To quantize a model, you need to create an [`HqqConfig`]. There are two ways of
from transformers import AutoModelForCausalLM, AutoTokenizer, HqqConfig
# Method 1: all linear layers will use the same quantization config
quant_config = HqqConfig(nbits=8, group_size=64, quant_zero=False, quant_scale=False, axis=0) #axis=0 is used by default
quant_config = HqqConfig(nbits=8, group_size=64)
```
``` Python
# Method 2: each linear layer with the same tag will use a dedicated quantization config
q4_config = {'nbits':4, 'group_size':64, 'quant_zero':False, 'quant_scale':False}
q3_config = {'nbits':3, 'group_size':32, 'quant_zero':False, 'quant_scale':False}
q4_config = {'nbits':4, 'group_size':64}
q3_config = {'nbits':3, 'group_size':32}
quant_config = HqqConfig(dynamic_config={
'self_attn.q_proj':q4_config,
'self_attn.k_proj':q4_config,