Hqq serialization (#33141)
* HQQ model serialization attempt * fix hqq dispatch and unexpected keys * style * remove check_old_param * revert to check HQQLinear in quantizer_hqq.py * revert to check HQQLinear in quantizer_hqq.py * update HqqConfig default params * make ci happy * make ci happy * revert to HQQLinear check in quantizer_hqq.py * check hqq_min version 0.2.0 * set axis=1 as default in quantization_config.py * validate_env with hqq>=0.2.0 version message * deprecated hqq kwargs message * make ci happy * remove run_expected_keys_check hack + bump to 0.2.1 min hqq version * fix unexpected_keys hqq update * add pre_quantized check * add update_expected_keys to base quantizerr * ci base.py fix? * ci base.py fix? * fix "quantization typo" src/transformers/utils/quantization_config.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix post merge --------- Co-authored-by: Marc Sun <marc@huggingface.co> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
6
docs/source/en/quantization/hqq.md
Normal file → Executable file
6
docs/source/en/quantization/hqq.md
Normal file → Executable file
@@ -30,13 +30,13 @@ To quantize a model, you need to create an [`HqqConfig`]. There are two ways of
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer, HqqConfig
|
||||
|
||||
# Method 1: all linear layers will use the same quantization config
|
||||
quant_config = HqqConfig(nbits=8, group_size=64, quant_zero=False, quant_scale=False, axis=0) #axis=0 is used by default
|
||||
quant_config = HqqConfig(nbits=8, group_size=64)
|
||||
```
|
||||
|
||||
``` Python
|
||||
# Method 2: each linear layer with the same tag will use a dedicated quantization config
|
||||
q4_config = {'nbits':4, 'group_size':64, 'quant_zero':False, 'quant_scale':False}
|
||||
q3_config = {'nbits':3, 'group_size':32, 'quant_zero':False, 'quant_scale':False}
|
||||
q4_config = {'nbits':4, 'group_size':64}
|
||||
q3_config = {'nbits':3, 'group_size':32}
|
||||
quant_config = HqqConfig(dynamic_config={
|
||||
'self_attn.q_proj':q4_config,
|
||||
'self_attn.k_proj':q4_config,
|
||||
|
||||
Reference in New Issue
Block a user