Armaghan Shakir
55736eea99
Add support for MiniMax's MiniMax-Text-01 (#35831)
* end-to-end architecture
* lightning-attn: refactor, clean, optimize
* put minimax_text_01 in other files
* use latest __init__ standards and auto-generate modular
* support attention_mask for lightning-attn
* Revert "use latest __init__ standards and auto-generate modular"
This reverts commit d8d3c409d89e335c98a8cd36f47304a76eac7493.
* fix modular conversion
* pass both attention masks instead of tuple
* formatting
* Updated Dynamic Cache
* created MiniMaxText01Cache
* fix hardcoded slope_rate
* update attn_type_list in config
* fix lightning when use_cache=False
* copy tests from mixtral
* (checkpoint) all tests pass for normal attention
* fix all unittests
* fix import sorting
* fix consistency and formatting tests
* fix config
* update tests, since changes in main
* fix seq_len error
* create dummy docs
* fix checkpoint
* add checkpoint in config docstring
* run modular_conversion
* update docs
* fix checkpoint path and update tests
* fix ruff
* remove repeated expected_slice
* update docs
* rename "minimax-text-01" to "minimax"
* inherit config from mixtral
* remove from docs in other languages
* undo files that should be untouched
* move minimax to end in conversation docs
* use MiniMaxForCausalLM as it is
* ruff fixes
* run modular
* fix docstring example in causallm
* refactor attention loop and decay factors
* refactor config in modular
* run modular
* refactor cache
* rename static_cache to linear_cache
* make positional embeddings necessary
* remove unnecessary layernorms declarations
* fix import in tests
* refactor attention in next tokens
* remove outdated code
* formatting and modular
* update tests
* rename layernorm alpha/beta factors
* register decay factors as buffers
* remove unused declarations of decay factors
* update config for alpha/beta factors
* run modular
* remove head_dim in tests
* remove minimax from fx.py
* remove stuff that is not really needed
* update __init__
* update qkv torch.split
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
* fix qkv torch.split
* quality fixes
* remove mistakenly added dummy
* purge unused ModelTester code
* fix-copies
* run fix-copies
* fix head_dim
* write cache formatting tests
* remove postnorm
* avoid contiguous in attention current states
* update expected_slice
* add generation test for integration
* fix dtype in generation test
* update authors
* update with changes in main
* update graident checkpointing and minor fixes
* fix mutable attn_type_list
* rename: attn_type -> layer_type
* update for layer_types
* update integration tests
* update checkpoint
* clean overview in docs
---------
Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>
2025-06-04 09:38:40 +02:00
..
2025-06-02 12:13:30 +02:00
2025-05-26 14:42:37 +00:00
2025-06-04 09:38:40 +02:00
2025-06-03 15:43:31 +00:00
2025-05-23 16:39:47 +00:00
2025-04-30 11:00:10 -07:00
2024-11-28 16:04:05 +01:00
2024-05-28 18:29:22 +02:00
2025-06-04 09:38:40 +02:00
2025-03-03 10:33:46 -08:00
2025-05-30 16:05:07 +00:00
2025-04-30 12:15:43 +01:00
2025-03-03 10:33:46 -08:00
2025-04-11 18:42:37 +01:00
2025-05-22 11:38:26 +02:00
2024-02-08 14:13:35 -08:00
2025-05-08 17:46:07 -04:00
2025-03-03 10:33:46 -08:00
2025-06-03 09:53:23 -07:00
2025-03-07 13:09:02 +00:00
2025-04-10 14:42:32 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2024-09-09 10:47:24 +02:00
2025-05-12 14:04:41 +01:00
2025-03-24 14:08:29 +00:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-05-23 14:49:39 +00:00
2025-03-11 15:29:14 +01:00
2024-07-08 11:52:47 +01:00
2025-05-19 10:37:54 -07:00
2025-03-03 10:33:46 -08:00
2025-05-12 11:55:51 +02:00
2025-03-31 09:50:49 +02:00
2025-04-07 15:19:47 +02:00
2025-06-02 12:13:30 +02:00
2025-04-03 14:15:53 +01:00
2025-03-11 09:41:41 -07:00
2025-05-23 14:49:39 +00:00
2024-09-24 03:40:56 -06:00
2025-03-03 10:33:46 -08:00
2024-03-23 18:29:39 -07:00
2025-05-19 13:14:21 +00:00
2025-05-23 19:48:01 +02:00
2025-03-03 10:33:46 -08:00
2024-09-09 10:47:24 +02:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-03-03 10:33:46 -08:00
2025-05-20 08:23:03 +00:00
2025-05-06 14:32:55 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-04-29 13:28:06 -07:00
2025-04-17 14:54:44 +01:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2024-11-26 09:23:34 -08:00
2023-11-06 19:45:03 +00:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-04-15 08:35:05 -07:00
2024-09-09 10:47:24 +02:00
2025-05-19 13:16:35 +00:00
2025-03-03 10:33:46 -08:00
2025-03-11 13:47:38 +00:00
2025-03-03 10:33:46 -08:00
2025-05-06 14:32:44 +01:00
2025-01-26 15:26:38 -08:00
2024-11-18 18:42:28 +00:00
2025-03-04 13:47:41 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2024-06-03 16:52:23 -07:00
2025-04-11 18:42:37 +01:00
2025-05-08 16:47:45 +01:00
2025-05-14 12:40:00 +00:00
2025-03-03 10:33:46 -08:00
2024-02-16 08:16:58 +01:00
2025-05-12 11:55:51 +02:00