Add support for MiniMax's MiniMax-Text-01 (#35831)

* end-to-end architecture

* lightning-attn: refactor, clean, optimize

* put minimax_text_01 in other files

* use latest __init__ standards and auto-generate modular

* support attention_mask for lightning-attn

* Revert "use latest __init__ standards and auto-generate modular"

This reverts commit d8d3c409d89e335c98a8cd36f47304a76eac7493.

* fix modular conversion

* pass both attention masks instead of tuple

* formatting

* Updated Dynamic Cache

* created MiniMaxText01Cache

* fix hardcoded slope_rate

* update attn_type_list in config

* fix lightning when use_cache=False

* copy tests from mixtral

* (checkpoint) all tests pass for normal attention

* fix all unittests

* fix import sorting

* fix consistency and formatting tests

* fix config

* update tests, since changes in main

* fix seq_len error

* create dummy docs

* fix checkpoint

* add checkpoint in config docstring

* run modular_conversion

* update docs

* fix checkpoint path and update tests

* fix ruff

* remove repeated expected_slice

* update docs

* rename "minimax-text-01" to "minimax"

* inherit config from mixtral

* remove from docs in other languages

* undo files that should be untouched

* move minimax to end in conversation docs

* use MiniMaxForCausalLM as it is

* ruff fixes

* run modular

* fix docstring example in causallm

* refactor attention loop and decay factors

* refactor config in modular

* run modular

* refactor cache

* rename static_cache to linear_cache

* make positional embeddings necessary

* remove unnecessary layernorms declarations

* fix import in tests

* refactor attention in next tokens

* remove outdated code

* formatting and modular

* update tests

* rename layernorm alpha/beta factors

* register decay factors as buffers

* remove unused declarations of decay factors

* update config for alpha/beta factors

* run modular

* remove head_dim in tests

* remove minimax from fx.py

* remove stuff that is not really needed

* update __init__

* update qkv torch.split

Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

* fix qkv torch.split

* quality fixes

* remove mistakenly added dummy

* purge unused ModelTester code

* fix-copies

* run fix-copies

* fix head_dim

* write cache formatting tests

* remove postnorm

* avoid contiguous in attention current states

* update expected_slice

* add generation test for integration

* fix dtype in generation test

* update authors

* update with changes in main

* update graident checkpointing and minor fixes

* fix mutable attn_type_list

* rename: attn_type -> layer_type

* update for layer_types

* update integration tests

* update checkpoint

* clean overview in docs

---------

Co-authored-by: Shakib-IO <shakib.khan17@northsouth.edu>
Co-authored-by: Cyril Vallez <cyril.vallez@gmail.com>

This commit is contained in:

Armaghan Shakir

2025-06-04 12:38:40 +05:00

committed by

GitHub

parent 037acf1d10

commit 55736eea99

15 changed files with 2650 additions and 1 deletions

									
										2

tests/test_modeling_common.py
									
												View File
												
				@@ -3946,7 +3946,7 @@ class ModelTesterMixin:

				        if not self.has_attentions:

				            self.skipTest(reason="Model architecture does not support attentions")

				        WINDOW_ATTENTION_MODELS = ["mistral", "mixtral", "qwen2", "qwen_moe", "starcoder2"]

				        WINDOW_ATTENTION_MODELS = ["mistral", "mixtral", "minimax", "qwen2", "qwen_moe", "starcoder2"]

				        if len(self.all_generative_model_classes) == 0:

				            self.skipTest(f"No generative model classes for {self.__class__.__name__}")

Add support for MiniMax's MiniMax-Text-01 (#35831)

2 tests/test_modeling_common.py Unescape Escape View File

2

tests/test_modeling_common.py

View File