mobicham
59952994c4
Add HQQ quantization support (#29637)
* update HQQ transformers integration
* push import_utils.py
* add force_hooks check in modeling_utils.py
* fix | with Optional
* force bias as param
* check bias is Tensor
* force forward for multi-gpu
* review fixes pass
* remove torch grad()
* if any key in linear_tags fix
* add cpu/disk check
* isinstance return
* add multigpu test + refactor tests
* clean hqq_utils imports in hqq.py
* clean hqq_utils imports in quantizer_hqq.py
* delete hqq_utils.py
* Delete src/transformers/utils/hqq_utils.py
* ruff init
* remove torch.float16 from __init__ in test
* refactor test
* isinstance -> type in quantizer_hqq.py
* cpu/disk device_map check in quantizer_hqq.py
* remove type(module) nn.linear check in quantizer_hqq.py
* add BaseQuantizeConfig import inside HqqConfig init
* remove hqq import in hqq.py
* remove accelerate import from test_hqq.py
* quant config.py doc update
* add hqqconfig to main_classes doc
* make style
* __init__ fix
* ruff __init__
* skip_modules list
* hqqconfig format fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* test_hqq.py remove mistral comment
* remove self.using_multi_gpu is False
* torch_dtype default val set and logger.info
* hqq.py isinstance fix
* remove torch=None
* torch_device test_hqq
* rename test_hqq
* MODEL_ID in test_hqq
* quantizer_hqq setattr fix
* quantizer_hqq typo fix
* imports quantizer_hqq.py
* isinstance quantizer_hqq
* hqq_layer.bias reformat quantizer_hqq
* Step 2 as comment in quantizer_hqq
* prepare_for_hqq_linear() comment
* keep_in_fp32_modules fix
* HqqHfQuantizer reformat
* quantization.md hqqconfig
* quantization.md model example reformat
* quantization.md # space
* quantization.md space })
* quantization.md space })
* quantization_config fix doc
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* axis value check in quantization_config
* format
* dynamic config explanation
* quant config method in quantization.md
* remove shard-level progress
* .cuda fix modeling_utils
* test_hqq fixes
* make fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-02 17:51:49 +01:00
..
2023-09-05 12:27:20 +01:00
2024-02-14 10:29:22 +00:00
2023-11-09 12:19:31 +00:00
2023-06-20 18:07:47 -04:00
2023-06-20 18:07:47 -04:00
2024-01-24 08:31:28 -08:00
2023-10-16 09:52:29 +02:00
2023-06-20 18:07:47 -04:00
2023-06-20 18:07:47 -04:00
2023-10-12 10:48:38 +02:00
2024-04-01 18:47:32 -07:00
2023-06-20 18:07:47 -04:00
2024-04-25 12:07:21 +01:00
2024-02-16 08:16:58 +01:00
2024-02-16 08:16:58 +01:00
2023-11-06 19:45:03 +00:00
2024-05-02 17:51:49 +01:00
2024-03-06 11:18:35 +00:00
2023-11-06 15:11:00 +01:00
2023-12-20 10:37:23 -08:00