Add codestral mamba2 (#32080)

* add new model like

* draft cuda forward - mismatched keys (sharding on conv1)

* match keys successfully

* fix split

* get generation/forward running (wrong gens, norm?)

* :update

* some refactoring

* fixes

* works up until copy to cache

* fix

* update

* NON WORKING VERSION

* version that work?

* nit

* fix config

* fix conversion script

* working cuda forward

* nit

* update

* simplifcation

* make mamba slow simple work

* no einops

* todo

* fix style

* no einops

* update fix no einsum

* nit

* remove einops

* bug: scan_output differs strongly

* add rms norm option

* fix fast + slow generation with and w/o cache ✔️

* draft integration tests

* remove a big chunk of the einsum

* fix slow, fast generations, without any einsum

* fix copies

* fix structure

* fix up modeling and tests

* fix tests

* clamping is indeed worse

* recover mamba2 cache test

* fix copies

* no cache position (yet)

* fix tf tests

* fix matmul for generate

* fixup

* skip cache tests for now

* [run-slow]mamba2

* tune out hidden states for padding

* test batched generation

* propagate attention mask changes

* fix past length

* fix integration test

* style

* address comments

* update readme

* add mamba2 version check

* fix tests

* [run-slow]mamba2

* skip edge tests

* [run-slow]mamba2

* last fixup

* [run-slow]mamba2

* update README

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>

This commit is contained in:

Pablo Montalvo

2024-08-06 16:39:52 +02:00

committed by

GitHub

parent 3d8bd11942

commit 80b90e7b2f

16 changed files with 1947 additions and 0 deletions

									
										2

docs/source/en/_toctree.yml
									
												View File
												
				@@ -438,6 +438,8 @@

				        title: MADLAD-400

				      - local: model_doc/mamba

				        title: Mamba

				      - local: model_doc/mamba2

				        title: mamba2

				      - local: model_doc/marian

				        title: MarianMT

				      - local: model_doc/markuplm

Add codestral mamba2 (#32080)

2 docs/source/en/_toctree.yml Unescape Escape View File

2

docs/source/en/_toctree.yml

View File