Cyril Vallez
163138a911
🚨🚨[core] Completely rewrite the masking logic for all attentions (#37866)
* start
* start having a clean 4d mask primitive
* Update mask_utils.py
* Update mask_utils.py
* switch name
* Update masking_utils.py
* add a new AttentionMask tensor class
* fix import
* nits
* fixes
* use full and quandrants
* general sdpa mask for all caches
* style
* start some tests
* tests with sliding, chunked
* add styling
* test hybrid
* Update masking_utils.py
* small temp fixes
* Update modeling_gemma2.py
* compile compatible
* Update masking_utils.py
* improve
* start making it more general
* Update masking_utils.py
* generate
* make it work with flex style primitives!
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* improve
* Update cache_utils.py
* Update masking_utils.py
* simplify - starting to look good!
* Update masking_utils.py
* name
* Update masking_utils.py
* style
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* small fix for flex
* flex compile
* FA2
* Update masking_utils.py
* Escape for TGI/vLLM!
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* General case without cache
* rename
* full test on llama4
* small fix for FA2 guard with chunk
* Update modeling_gemma2.py
* post rebase cleanup
* FA2 supports static cache!
* Update modeling_flash_attention_utils.py
* Update flex_attention.py
* Update masking_utils.py
* Update masking_utils.py
* Update utils.py
* override for export
* Update executorch.py
* Update executorch.py
* Update executorch.py
* Update executorch.py
* Update masking_utils.py
* Update masking_utils.py
* output attentions
* style
* Update masking_utils.py
* Update executorch.py
* Add doicstring
* Add license and put mask visualizer at the end
* Update test_modeling_common.py
* fix broken test
* Update test_modeling_gemma.py
* Update test_modeling_gemma2.py
* Use fullgraph=False with FA2
* Update utils.py
* change name
* Update masking_utils.py
* improve doc
* change name
* Update modeling_attn_mask_utils.py
* more explicit logic based on model's property
* pattern in config
* extend
* fixes
* make it better
* generalize to other test models
* fix
* Update masking_utils.py
* fix
* do not check mask equivalence if layer types are different
* executorch
* Update modeling_gemma2.py
* Update masking_utils.py
* use layer_idx instead
* adjust
* Update masking_utils.py
* test
* fix imports
* Update modeling_gemma2.py
* other test models
* Update modeling_llama4.py
* Update masking_utils.py
* improve
* simplify
* Update masking_utils.py
* typos
* typo
* fix
* Update masking_utils.py
* default DynamicCache
* remove default cache
* simplify
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* simplify
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* export
* Update executorch.py
* Update executorch.py
* Update flex_attention.py
* Update executorch.py
* upstream to modular gemma 1 & 2
* Update modular_mistral.py
* switch names
* use dict
* put it in the Layer directly
* update copy model source for mask functions
* apply so many modular (hopefully 1 shot)
* use explicite dicts for make style happy
* protect import
* check docstring
* better default in hybrid caches
* qwens
* Update modular_qwen2.py
* simplify core logic!
* Update executorch.py
* qwen3 moe
* Update masking_utils.py
* Update masking_utils.py
* simplify a lot sdpa causal skip
* Update masking_utils.py
* post-rebase
* gemma3 finally
* style
* check it before
* gemma3
* More general with newer torch
* align gemma3
* Update utils.py
* Update utils.py
* Update masking_utils.py
* Update test_modeling_common.py
* Update flex_attention.py
* Update flex_attention.py
* Update flex_attention.py
* test
* executorch
* Update test_modeling_common.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update masking_utils.py
* Update executorch.py
* Update test_modeling_common.py
* fix copies
* device
* sdpa can be used without mask -> pass the torchscript tests in this case
* Use enum for check
* revert enum and add check instead
* remove broken test
* cohere2
* some doc & reorganize the Interface
* Update tensor_parallel.py
* Update tensor_parallel.py
* doc and dummy
* Update test_modeling_paligemma2.py
* Update modeling_falcon_h1.py
* Update masking_utils.py
* executorch patch
* style
* CIs
* use register in executorch
* final comments!
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2025-05-22 11:38:26 +02:00
..
2025-05-12 11:55:51 +02:00
2021-02-15 07:55:10 -05:00
2024-05-22 06:40:15 +02:00
2025-05-20 19:34:58 +02:00
2025-03-21 13:08:47 +01:00
2025-05-22 11:38:26 +02:00
2025-05-06 06:47:43 +02:00
2025-05-06 14:45:20 +01:00
2024-05-22 06:40:15 +02:00
2025-05-08 17:46:07 -04:00
2024-05-22 06:40:15 +02:00
2025-04-05 22:02:22 +02:00
2025-04-11 11:08:36 +02:00
2023-03-13 19:11:19 +01:00
2025-04-14 16:11:29 +01:00
2025-05-07 17:47:51 +02:00
2023-06-06 18:17:41 +02:00
2021-02-15 07:55:10 -05:00
2025-03-06 13:12:30 +00:00
2024-08-27 11:58:27 +01:00
2025-03-13 15:12:44 +00:00
2025-03-13 15:12:44 +00:00
2025-03-25 16:00:11 +01:00
2025-05-09 11:45:03 +02:00
2024-04-15 15:08:09 +02:00
2025-04-02 14:39:57 +02:00
2024-01-31 15:58:17 +01:00
2025-03-25 16:00:11 +01:00
2023-02-03 12:57:02 -05:00
2025-05-20 19:34:58 +02:00
2024-08-27 11:58:27 +01:00
2024-04-12 10:01:28 +02:00
2024-05-22 06:40:15 +02:00
2025-05-12 11:55:51 +02:00
2025-04-28 19:07:09 +02:00
2024-04-15 13:20:36 +02:00
2025-05-20 19:34:58 +02:00
2025-05-20 19:34:58 +02:00
2025-03-25 16:00:11 +01:00
2024-10-09 09:21:46 +02:00
2025-02-24 17:53:18 +01:00
2025-05-22 11:03:56 +02:00
2025-05-20 19:34:58 +02:00
2025-03-25 16:00:11 +01:00
2024-09-03 16:53:21 +02:00
2025-03-11 13:47:38 +00:00
2024-06-10 15:16:58 +02:00
2024-05-09 22:57:52 +02:00
2024-05-22 06:40:15 +02:00
2025-03-13 15:12:44 +00:00
2024-04-24 22:32:42 +02:00
2025-04-28 14:20:45 +01:00
2025-03-25 16:00:11 +01:00
2024-07-22 14:14:47 +01:00