Marc Sun
cac4a4876b
[Quantization] Switch to optimum-quanto ( #31732 )
...
* switch to optimum-quanto rebase squach
* fix import check
* again
* test try-except
* style
2024-10-02 15:14:34 +02:00
Younes Belkada
658b849aeb
Quantization / TST: Fix remaining quantization tests ( #31000 )
...
* Fix remaining quant tests
* Update test_quanto.py
2024-05-24 14:35:59 +02:00
Younes Belkada
fce78fd0e9
FIX / Quantization: Fix Dockerfile build ( #30890 )
...
* Update Dockerfile
* Update docker/transformers-quantization-latest-gpu/Dockerfile
2024-05-20 10:08:26 +02:00
Younes Belkada
4e17e7dcf8
TST / Quantization: Reverting to torch==2.2.1 ( #30866 )
...
Reverting to 2.2.1
2024-05-16 17:30:02 +02:00
Yih-Dar
2d83324ecf
Use torch 2.3 for CI ( #30837 )
...
2.3
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com >
2024-05-15 19:31:52 +02:00
mobicham
59952994c4
Add HQQ quantization support ( #29637 )
...
* update HQQ transformers integration
* push import_utils.py
* add force_hooks check in modeling_utils.py
* fix | with Optional
* force bias as param
* check bias is Tensor
* force forward for multi-gpu
* review fixes pass
* remove torch grad()
* if any key in linear_tags fix
* add cpu/disk check
* isinstance return
* add multigpu test + refactor tests
* clean hqq_utils imports in hqq.py
* clean hqq_utils imports in quantizer_hqq.py
* delete hqq_utils.py
* Delete src/transformers/utils/hqq_utils.py
* ruff init
* remove torch.float16 from __init__ in test
* refactor test
* isinstance -> type in quantizer_hqq.py
* cpu/disk device_map check in quantizer_hqq.py
* remove type(module) nn.linear check in quantizer_hqq.py
* add BaseQuantizeConfig import inside HqqConfig init
* remove hqq import in hqq.py
* remove accelerate import from test_hqq.py
* quant config.py doc update
* add hqqconfig to main_classes doc
* make style
* __init__ fix
* ruff __init__
* skip_modules list
* hqqconfig format fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* hqqconfig doc fix
* test_hqq.py remove mistral comment
* remove self.using_multi_gpu is False
* torch_dtype default val set and logger.info
* hqq.py isinstance fix
* remove torch=None
* torch_device test_hqq
* rename test_hqq
* MODEL_ID in test_hqq
* quantizer_hqq setattr fix
* quantizer_hqq typo fix
* imports quantizer_hqq.py
* isinstance quantizer_hqq
* hqq_layer.bias reformat quantizer_hqq
* Step 2 as comment in quantizer_hqq
* prepare_for_hqq_linear() comment
* keep_in_fp32_modules fix
* HqqHfQuantizer reformat
* quantization.md hqqconfig
* quantization.md model example reformat
* quantization.md # space
* quantization.md space })
* quantization.md space })
* quantization_config fix doc
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* axis value check in quantization_config
* format
* dynamic config explanation
* quant config method in quantization.md
* remove shard-level progress
* .cuda fix modeling_utils
* test_hqq fixes
* make fix-copies
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-05-02 17:51:49 +01:00
zhong zhuang
b4c18a830a
[FEAT]: EETQ quantizer support ( #30262 )
...
* [FEAT]: EETQ quantizer support
* Update quantization.md
* Update docs/source/en/main_classes/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update docs/source/en/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update docs/source/en/quantization.md
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/integrations/__init__.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/integrations/__init__.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/integrations/eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/quantizers/auto.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/quantizers/quantizer_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update src/transformers/quantizers/quantizer_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* Update tests/quantization/eetq_integration/test_eetq.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* [FEAT]: EETQ quantizer support
* [FEAT]: EETQ quantizer support
* remove whitespaces
* update quantization.md
* style
* Update docs/source/en/quantization.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* add copyright
* Update quantization.md
* Update docs/source/en/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Update docs/source/en/quantization.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
* Address the comments by amyeroberts
* style
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
Co-authored-by: Marc Sun <marc@huggingface.co >
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com >
2024-04-22 20:38:58 +01:00
Marc Sun
58a939c6b7
Fix quantization tests ( #29914 )
...
* revert back to torch 2.1.1
* run test
* switch to torch 2.2.1
* udapte dockerfile
* fix awq tests
* fix test
* run quanto tests
* update tests
* split quantization tests
* fix
* fix again
* final fix
* fix report artifact
* build docker again
* Revert "build docker again"
This reverts commit 399a5f9d9308da071d79034f238c719de0f3532e.
* debug
* revert
* style
* new notification system
* testing notfication
* rebuild docker
* fix_prev_ci_results
* typo
* remove warning
* fix typo
* fix artifact name
* debug
* issue fixed
* debug again
* fix
* fix time
* test notif with faling test
* typo
* issues again
* final fix ?
* run all quantization tests again
* remove name to clear space
* revert modfiication done on workflow
* fix
* build docker
* build only quant docker
* fix quantization ci
* fix
* fix report
* better quantization_matrix
* add print
* revert to the basic one
2024-04-09 17:10:29 +02:00
Marc Sun
28de2f4de3
[Quantization] Quanto quantizer ( #29023 )
...
* start integration
* fix
* add and debug tests
* update tests
* make pytorch serialization works
* compatible with device_map and offload
* fix tests
* make style
* add ref
* guard against safetensors
* add float8 and style
* fix is_serializable
* Fix shard_checkpoint compatibility with quanto
* more tests
* docs
* adjust memory
* better
* style
* pass tests
* Update src/transformers/modeling_utils.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* add is_safe_serialization instead
* Update src/transformers/quantizers/quantizer_quanto.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* add QbitsTensor tests
* fix tests
* simplify activation list
* Update docs/source/en/quantization.md
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com >
* better comment
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com >
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com >
* find and fix edge case
* Update docs/source/en/quantization.md
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* pass weights_only_kwarg instead
* fix shard_checkpoint loading
* simplify update_missing_keys
* Update tests/quantization/quanto_integration/test_quanto.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* recursion to get all tensors
* block serialization
* skip serialization tests
* fix
* change by cuda:0 for now
* fix regression
* update device_map
* fix doc
* add noteboon
* update torch_dtype
* update doc
* typo
* typo
* remove comm
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: Younes Belkada <younesbelkada@gmail.com >
2024-03-15 11:51:29 -04:00
Ilyas Moutawwakil
4fc708f98c
Exllama kernels support for AWQ models ( #28634 )
...
* added exllama kernels support for awq models
* doc
* style
* Update src/transformers/modeling_utils.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
* refactor
* moved exllama post init to after device dispatching
* bump autoawq version
* added exllama test
* style
* configurable exllama kernels
* copy exllama_config from gptq
* moved exllama version check to post init
* moved to quantization dockerfile
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com >
2024-03-05 03:22:48 +01:00
Marc Sun
f54d82cace
[CI] Quantization workflow ( #29046 )
...
* [CI] Quantization workflow
* build dockerfile
* fix dockerfile
* update self-cheduled.yml
* test build dockerfile on push
* fix torch install
* udapte to python 3.10
* update aqlm version
* uncomment build dockerfile
* tests if the scheduler works
* fix docker
* do not trigger on psuh again
* add additional runs
* test again
* all good
* style
* Update .github/workflows/self-scheduled.yml
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
* test build dockerfile with torch 2.2.0
* fix extra
* clean
* revert changes
* Revert "revert changes"
This reverts commit 4cb52b8822da9d1786a821a33e867e4fcc00d8fd.
* revert correct change
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
2024-02-28 10:09:25 -05:00