Manuel de Prada Corral
c338fd43b0
[cache refactor] Move all the caching logic to a per-layer approach ( #39106 )
...
* Squash for refactor: Replace monolithic cache classes with modular LayeredCache (#38077 )
- Introduces CacheLayer and Cache base classes
- Ports Static, Dynamic, Offloaded, Quantized, Hybrid, etc. to use layers
- Implements method/attr dispatch across layers to reduce boilerplate
- Adds CacheProcessor hooks for offloading, quantization, etc.
- Updates and passes tests
* fix quantized, add tests
* remove CacheProcessorList
* raushan review, arthur review
* joao review: minor things
* remove cache configs, make CacheLayer a mixin (joaos review)
* back to storage inside Cache()
* remove cachebase for decorator
* no more __getattr__
* fix tests
* joaos review except docs
* fix ast deprecations for python 3.14: replace node.n by node.value and use `ast.Constant`
More verbose exceptions in `fix_docstring` on docstring formatting issues.
* Revert "back to storage inside Cache()"
This reverts commit 27916bc2737806bf849ce2148cb1e66d59573913.
* cyril review
* simplify cache export
* fix lfm2 cache
* HybridChunked to layer
* BC proxy object for cache.key_cache[i]=...
* reorder classes
* bfff come on LFM2
* better tests for hybrid and hybridChunked
* complete coverage for hybrid chunked caches (prefill chunking)
* reimplementing HybridChunked
* cyril review
* fix ci
* docs for cache refactor
* docs
* oopsie
* oopsie
* fix after merge
* cyril review
* arthur review
* opsie
* fix lfm2
* opsie2
2025-07-22 16:10:25 +02:00
Raushan Turganbay
cd98c1fee3
[docs] update attention implementation and cache docs ( #39547 )
...
* update docs
* Apply suggestions from code review
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
* applu suggestions
---------
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-07-22 15:06:43 +02:00
Raushan Turganbay
c8524aeb07
[cache] make all classes cache compatible finally ( #38635 )
...
* dump
* push other models
* fix simple greedy generation
* xmod
* add fmst and clean up some mentions of old cache format
* gpt-bigcode now follows standards
* delete tuple cache reference in generation
* fix some models
* fix some models
* fix mambas and support cache in tapas
* fix some more tests
* fix copies
* delete `_reorder_cache`
* another fix copies
* fix typos and delete unnecessary test
* fix rag generate, needs special cache reordering
* fix tapas and superglue
* reformer create special cache
* recurrent gemma `reorder_cache` was a no-op, delete
* fix-copies
* fix blio and musicgen pipeline tests
* fix reformer
* fix reformer, again...
* delete `_supports_cache_class`
* delete `supports_quantized_cache`
* fix failing tests
* fix copies
* some minor clean up
* style
* style
* fix copies
* fix tests
* fix copies
* create causal mask now needs positions?
* fixc copies
* style
* Update tests/test_modeling_common.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* clean-up of non-generative model after merging main
* check `is_decoder` for cache
* delete transpose for scores
* remove tuple cache from docs everywhere
* fix tests
* fix copies
* fix copies once more
* properly deprecate `encoder_attention_mask` in Bert-like models
* import `deprecate_kwarg` where needed
* fix copies again
* fix copies
* delete `nex_decoder_cache`
* fix copies asks to update for PLM
* fix copies
* rebasing had a few new models, fix them and merge asap!
* fix copies once more
* fix slow tests
* fix tests and updare PLM checkpoint
* add read token and revert accidentally removed line
* oh com -on, style
* just skip it, read token has no access to PLM yet
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2025-07-16 14:00:17 +02:00
Raushan Turganbay
bc161d5d06
Delete deprecated stuff ( #38838 )
...
* delete deprecated stuff
* fix copies
* remove unused tests
* fix modernbert and fuyu
* Update src/transformers/cache_utils.py
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
* bye bye `seen_tokens`
* address comments
* update typings
* ecnoder decoder models follow same pattern as whisper
* fix copies
* why is it set to False?
* fix switch transformers
* fix encoder decoder models shared weight
* fix copies and RAG
* remove `next_cache`
* fix gptj/git
* fix copies
* fix copies
* style...
* another forgotten docsrting
---------
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com >
2025-07-10 05:18:44 +00:00
Steven Liu
78d771c3c2
[docs] Format fix ( #38414 )
...
fix table
2025-06-03 09:53:23 -07:00
Manuel de Prada Corral
78079abeff
Improved cache docs ( #38060 )
...
* improved cache docs
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com >
2025-05-26 13:53:41 +00:00
omahs
cbf924b76c
Fix typos ( #36910 )
...
* fix typos
* fix typos
* fix typos
* fix typos
2025-03-24 14:08:29 +00:00
Steven Liu
c0f8d055ce
[docs] Redesign ( #31757 )
...
* toctree
* not-doctested.txt
* collapse sections
* feedback
* update
* rewrite get started sections
* fixes
* fix
* loading models
* fix
* customize models
* share
* fix link
* contribute part 1
* contribute pt 2
* fix toctree
* tokenization pt 1
* Add new model (#32615 )
* v1 - working version
* fix
* fix
* fix
* fix
* rename to correct name
* fix title
* fixup
* rename files
* fix
* add copied from on tests
* rename to `FalconMamba` everywhere and fix bugs
* fix quantization + accelerate
* fix copies
* add `torch.compile` support
* fix tests
* fix tests and add slow tests
* copies on config
* merge the latest changes
* fix tests
* add few lines about instruct
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* fix
* fix tests
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
* "to be not" -> "not to be" (#32636 )
* "to be not" -> "not to be"
* Update sam.md
* Update trainer.py
* Update modeling_utils.py
* Update test_modeling_utils.py
* Update test_modeling_utils.py
* fix hfoption tag
* tokenization pt. 2
* image processor
* fix toctree
* backbones
* feature extractor
* fix file name
* processor
* update not-doctested
* update
* make style
* fix toctree
* revision
* make fixup
* fix toctree
* fix
* make style
* fix hfoption tag
* pipeline
* pipeline gradio
* pipeline web server
* add pipeline
* fix toctree
* not-doctested
* prompting
* llm optims
* fix toctree
* fixes
* cache
* text generation
* fix
* chat pipeline
* chat stuff
* xla
* torch.compile
* cpu inference
* toctree
* gpu inference
* agents and tools
* gguf/tiktoken
* finetune
* toctree
* trainer
* trainer pt 2
* optims
* optimizers
* accelerate
* parallelism
* fsdp
* update
* distributed cpu
* hardware training
* gpu training
* gpu training 2
* peft
* distrib debug
* deepspeed 1
* deepspeed 2
* chat toctree
* quant pt 1
* quant pt 2
* fix toctree
* fix
* fix
* quant pt 3
* quant pt 4
* serialization
* torchscript
* scripts
* tpu
* review
* model addition timeline
* modular
* more reviews
* reviews
* fix toctree
* reviews reviews
* continue reviews
* more reviews
* modular transformers
* more review
* zamba2
* fix
* all frameworks
* pytorch
* supported model frameworks
* flashattention
* rm check_table
* not-doctested.txt
* rm check_support_list.py
* feedback
* updates/feedback
* review
* feedback
* fix
* update
* feedback
* updates
* update
---------
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com >
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com >
Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com >
2025-03-03 10:33:46 -08:00