Arthur
2c47618c1a
🚨All attention refactor🚨 (#35235)
* refactor LlamaAttention
* minimal changes
* fix llama
* update
* modular gemmas
* modular nits
* modular updates
* nits
* simplify
* gpt2
* more modualr and fixes
* granite
* modular modular modular
* nits
* update
* qwen2 + starcoder2
* mostly gemma2
* Update image_processing_auto.py
* fix
* Update modular_starcoder2.py
* fix
* remove all copied from attentions
* remove gcv
* make fix-copies
* oups
* oups2.0
* fix some modulars + all copied from
* should be good now
* revert unwanted changes
* Update modeling_decision_transformer.py
* finish cleanup
* Update modeling_olmo.py
* consistency
* re-add gradient checkpointing attribute
* fix
* style
* make config necessary
* bis
* bis
* Update modeling_my_new_model2.py
* is_causal attr
* fix
* remove past kv return from decoder layer
* fix
* default rope config
* correctly fix rope config
* fix bias
* fix gpt2 attention output
* fix test
* fix inits
* fix default sdpa
* fix default sdpa implementation
* harmonize classes
* fix mistral
* fix sliding window models
* mixtral
* be more explicit
* style
* fix
* several fixes
* Update modeling_dbrx.py
* fix test
* olmo + phi
* rotary
* syle
* phi
* phi again
* again
* kwargs
* Update test_modeling_common.py
* skip fx tracing tests
* Update modeling_utils.py
* gemma 2
* again
* Update modeling_recurrent_gemma.py
* gemma2
* granite
* style
* starcoder
* Update sdpa_attention.py
* switch args
* Update modeling_mllama.py
* fix
* cache type tests
* gpt2
* Update test_modeling_common.py
* fix
* consistency
* fix shape with encoder
* should be the last one
* tests non model
* most comments
* small oupsi
* be more explicit in modulars
* more explicit modulars
* CIs! it works locally
* add kwargs to _flash_attention_forward
---------
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2024-12-18 16:53:39 +01:00
..
2024-09-10 11:10:53 +02:00
2022-02-23 15:46:28 -05:00
2022-04-19 16:13:08 +01:00
2023-02-02 09:33:04 -05:00
2024-08-14 12:06:17 +01:00
2024-07-23 14:54:38 +08:00
2024-05-22 15:23:04 +01:00
2024-10-04 22:47:08 +02:00
2024-07-02 20:00:42 +01:00
2024-08-23 11:12:53 +01:00
2024-12-05 17:02:27 +01:00
2023-04-06 14:00:29 +02:00
2024-06-10 12:35:10 +01:00
2024-06-26 21:59:08 +01:00
2023-05-24 15:40:19 -04:00
2024-12-05 17:02:27 +01:00
2023-02-27 15:31:55 +00:00
2024-06-10 12:35:10 +01:00
2024-10-10 11:58:26 +02:00
2024-05-29 11:55:43 +01:00
2024-12-05 17:02:27 +01:00
2024-07-22 14:13:39 +01:00
2024-09-20 20:58:51 +02:00
2023-02-28 16:24:14 -05:00
2024-08-13 07:59:01 +02:00
2024-06-26 21:59:08 +01:00
2024-12-05 17:02:27 +01:00
2024-10-04 12:39:37 +02:00
2024-01-30 17:26:36 +00:00
2024-12-05 17:02:27 +01:00
2024-12-18 16:53:39 +01:00
2024-05-29 11:55:43 +01:00
2024-09-17 14:44:27 -04:00
2023-02-06 18:10:56 -05:00
2024-12-05 17:02:27 +01:00
2024-01-15 18:36:40 +00:00
2024-10-31 15:48:11 -04:00