Joao Gante
|
cf32ee1753
|
Cache: use batch_size instead of max_batch_size (#32657)
* more precise name
* better docstrings
* Update src/transformers/cache_utils.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
|
2024-08-16 11:48:45 +01:00 |
|
Joao Gante
|
75bbfd5b22
|
Cache: Static cache as a standalone object (#30476)
|
2024-04-30 16:37:19 +01:00 |
|
Andrei Panferov
|
e3fc90ae68
|
Cleaner Cache dtype and device extraction for CUDA graph generation for quantizers compatibility (#29079)
* input_layernorm as the beacon of hope
* cleaner dtype extraction
* AQLM + CUDA graph test
* is available check
* shorter text test
|
2024-02-27 09:32:39 +01:00 |
|
Andrei Panferov
|
1ecf5f7c98
|
AQLM quantizer support (#28928)
* aqlm init
* calibration and dtypes
* docs
* Readme update
* is_aqlm_available
* Simpler link in docs
* Test TODO real reference
* init _import_structure fix
* AqlmConfig autodoc
* integration aqlm
* integrations in tests
* docstring fix
* legacy typing
* Less typings
* More kernels information
* Performance -> Accuracy
* correct tests
* remoced multi-gpu test
* Update docs/source/en/quantization.md
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/utils/quantization_config.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Brought back multi-gpu tests
* Update src/transformers/integrations/aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update tests/quantization/aqlm_integration/test_aqlm.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
---------
Co-authored-by: Andrei Panferov <blacksamorez@yandex-team.ru>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
|
2024-02-14 09:25:41 +01:00 |
|