mobicham
59952994c4
Add HQQ quantization support (#29637)
* update HQQ transformers integration
* push import_utils.py
* add force_hooks check in modeling_utils.py
* fix | with Optional
* force bias as param
* check bias is Tensor
* force forward for multi-gpu
* review fixes pass
* remove torch grad()
* if any key in linear_tags fix
* add cpu/disk check
* isinstance return
* add multigpu test + refactor tests
* clean hqq_utils imports in hqq.py
* clean hqq_utils imports in quantizer_hqq.py
* delete hqq_utils.py
* Delete src/transformers/utils/hqq_utils.py
* ruff init
* remove torch.float16 from __init__ in test
* refactor test
* isinstance -> type in quantizer_hqq.py
* cpu/disk device_map check in quantizer_hqq.py
* remove type(module) nn.linear check in quantizer_hqq.py
* add BaseQuantizeConfig import inside HqqConfig init
* remove hqq import in hqq.py
* remove accelerate import from test_hqq.py
* quant config.py doc update
* add hqqconfig to main_classes doc
* make style
* __init__ fix
* ruff __init__
* skip_modules list
* hqqconfig format fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* test_hqq.py remove mistral comment
* remove self.using_multi_gpu is False
* torch_dtype default val set and logger.info
* hqq.py isinstance fix
* remove torch=None
* torch_device test_hqq
* rename test_hqq
* MODEL_ID in test_hqq
* quantizer_hqq setattr fix
* quantizer_hqq typo fix
* imports quantizer_hqq.py
* isinstance quantizer_hqq
* hqq_layer.bias reformat quantizer_hqq
* Step 2 as comment in quantizer_hqq
* prepare_for_hqq_linear() comment
* keep_in_fp32_modules fix
* HqqHfQuantizer reformat
* quantization.md hqqconfig
* quantization.md model example reformat
* quantization.md # space
* quantization.md space })
* quantization.md space })
* quantization_config fix doc
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* axis value check in quantization_config
* format
* dynamic config explanation
* quant config method in quantization.md
* remove shard-level progress
* .cuda fix modeling_utils
* test_hqq fixes
* make fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-02 17:51:49 +01:00
..
2022-02-23 15:46:28 -05:00
2023-10-09 11:04:57 +02:00
2024-04-18 12:49:43 -04:00
2024-04-18 12:49:43 -04:00
2024-03-19 14:43:02 +00:00
2024-04-22 13:15:28 +01:00
2024-04-23 16:23:36 +05:00
2024-05-02 17:25:19 +01:00
2024-04-25 12:07:21 +01:00
2024-02-29 03:56:16 +01:00
2024-04-30 19:51:41 +01:00
2024-05-02 17:51:49 +01:00
2023-12-07 10:00:08 +01:00
2024-02-16 08:16:58 +01:00
2024-03-25 10:33:38 +01:00
2023-06-26 09:58:14 -04:00
2024-04-30 18:32:30 +01:00
2024-04-26 18:21:47 +01:00
2020-01-06 15:11:12 +01:00
2023-12-20 18:33:17 +00:00
2024-03-06 10:57:04 +00:00
2023-11-15 14:10:39 +01:00
2024-03-15 14:18:41 +00:00
2023-06-15 07:30:24 -04:00
2024-03-15 14:18:41 +00:00
2024-02-20 16:20:20 +01:00
2024-03-15 14:18:41 +00:00
2023-11-10 15:35:27 +00:00
2024-05-02 10:24:47 +02:00
2024-04-15 09:36:06 +01:00
2024-01-23 10:28:23 +01:00
2024-01-30 17:26:36 +00:00
2024-03-21 14:04:11 +00:00
2024-04-26 11:26:43 +01:00
2024-02-05 14:50:07 +00:00
2024-01-19 09:59:14 +00:00
2023-09-05 10:12:25 +02:00
2024-04-15 09:36:06 +01:00
2024-03-15 14:18:41 +00:00