[Quantization] Quanto quantizer (#29023)

* start integration

* fix

* add and debug tests

* update tests

* make pytorch serialization works

* compatible with device_map and offload

* fix tests

* make style

* add ref

* guard against safetensors

* add float8 and style

* fix is_serializable

* Fix shard_checkpoint compatibility with quanto

* more tests

* docs

* adjust memory

* better

* style

* pass tests

* Update src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add is_safe_serialization instead

* Update src/transformers/quantizers/quantizer_quanto.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add QbitsTensor tests

* fix tests

* simplify activation list

* Update docs/source/en/quantization.md

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* better comment

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* find and fix edge case

* Update docs/source/en/quantization.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pass weights_only_kwarg instead

* fix shard_checkpoint loading

* simplify update_missing_keys

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* recursion to get all tensors

* block serialization

* skip serialization tests

* fix

* change by cuda:0 for now

* fix regression

* update device_map

* fix doc

* add noteboon

* update torch_dtype

* update doc

* typo

* typo

* remove comm

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
This commit is contained in:
Marc Sun
2024-03-15 11:51:29 -04:00
committed by GitHub
parent f02aea2737
commit 28de2f4de3
18 changed files with 885 additions and 7 deletions

View File

@@ -45,6 +45,9 @@ RUN python3 -m pip install --no-cache-dir aqlm[gpu]==1.0.2
# Add autoawq for quantization testing
RUN python3 -m pip install --no-cache-dir https://github.com/casper-hansen/AutoAWQ/releases/download/v0.2.0/autoawq-0.2.0+cu118-cp38-cp38-linux_x86_64.whl
# Add quanto for quantization testing
RUN python3 -m pip install --no-cache-dir quanto
# When installing in editable mode, `transformers` is not recognized as a package.
# this line must be added in order for python to be aware of transformers.
RUN cd transformers && python3 setup.py develop