Quantized KV Cache (#30483)
* clean-up * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/cache_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup * Update tests/quantization/quanto_integration/test_quanto.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * more suggestions * mapping if torch available * run tests & add 'support_quantized' flag * fix jamba test * revert, will be fixed by another PR * codestyle * HQQ and versatile cache classes * final update * typo * make tests happy --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
e05baad861
commit
d583f1317b
@@ -360,6 +360,12 @@ A [`Constraint`] can be used to force the generation to include specific tokens
|
||||
[[autodoc]] Cache
|
||||
- update
|
||||
|
||||
[[autodoc]] CacheConfig
|
||||
- update
|
||||
|
||||
[[autodoc]] QuantizedCacheConfig
|
||||
- validate
|
||||
|
||||
[[autodoc]] DynamicCache
|
||||
- update
|
||||
- get_seq_length
|
||||
@@ -367,6 +373,14 @@ A [`Constraint`] can be used to force the generation to include specific tokens
|
||||
- to_legacy_cache
|
||||
- from_legacy_cache
|
||||
|
||||
[[autodoc]] QuantizedCache
|
||||
- update
|
||||
- get_seq_length
|
||||
|
||||
[[autodoc]] QuantoQuantizedCache
|
||||
|
||||
[[autodoc]] HQQQuantizedCache
|
||||
|
||||
[[autodoc]] SinkCache
|
||||
- update
|
||||
- get_seq_length
|
||||
@@ -375,7 +389,7 @@ A [`Constraint`] can be used to force the generation to include specific tokens
|
||||
[[autodoc]] StaticCache
|
||||
- update
|
||||
- get_seq_length
|
||||
- reorder_cache
|
||||
- reset
|
||||
|
||||
|
||||
## Watermark Utils
|
||||
|
||||
Reference in New Issue
Block a user