[cache refactor] Move all the caching logic to a per-layer approach (#39106)

* Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077)

- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests

* fix quantized, add tests

* remove CacheProcessorList

* raushan review, arthur review

* joao review: minor things

* remove cache configs, make CacheLayer a mixin (joaos review)

* back to storage inside Cache()

* remove cachebase for decorator

* no more __getattr__

* fix tests

* joaos review except docs

* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`

More verbose exceptions in `fix_docstring` on docstring formatting issues.

* Revert "back to storage inside Cache()"

This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913.

* cyril review

* simplify cache export

* fix lfm2 cache

* HybridChunked to layer

* BC proxy object for cache.key_cache[i]=...

* reorder classes

* bfff come on LFM2

* better tests for hybrid and hybridChunked

* complete coverage for hybrid chunked caches (prefill chunking)

* reimplementing HybridChunked

* cyril review

* fix ci

* docs for cache refactor

* docs

* oopsie

* oopsie

* fix after merge

* cyril review

* arthur review

* opsie

* fix lfm2

* opsie2

This commit is contained in:

Manuel de Prada Corral

2025-07-22 16:10:25 +02:00

committed by

GitHub

parent b16688e96a

commit c338fd43b0

64 changed files with 2779 additions and 2441 deletions

									
										6

docs/source/ko/internal/generation_utils.md
									
												View File
												
				@@ -345,12 +345,6 @@ generation_output[:2]

				[[autodoc]] Cache

				    - update

				[[autodoc]] CacheConfig

				    - update

				[[autodoc]] QuantizedCacheConfig

				    - validate

				[[autodoc]] DynamicCache

				    - update

				    - get_seq_length

[cache refactor] Move all the caching logic to a per-layer approach (#39106)

6 docs/source/ko/internal/generation_utils.md Unescape Escape View File

6

docs/source/ko/internal/generation_utils.md

View File