Files
HuggingFace_transformer/docs/source/en/model_doc
Pablo Montalvo 80b90e7b2f Add codestral mamba2 (#32080)
* add new model like

* draft cuda forward - mismatched keys (sharding on conv1)

* match keys successfully

* fix split

* get generation/forward running (wrong gens, norm?)

* :update

* some refactoring

* fixes

* works up until copy to cache

* fix

* update

* NON WORKING VERSION

* version that work?

* nit

* fix config

* fix conversion script

* working cuda forward

* nit

* update

* simplifcation

* make mamba slow simple work

* no einops

* todo

* fix style

* no einops

* update fix no einsum

* nit

* remove einops

* bug: scan_output differs strongly

* add rms norm option

* fix fast + slow generation with and w/o cache ✔️

* draft integration tests

* remove a big chunk of the einsum

* fix slow, fast generations, without any einsum

* fix copies

* fix structure

* fix up modeling and tests

* fix tests

* clamping is indeed worse

* recover mamba2 cache test

* fix copies

* no cache position (yet)

* fix tf tests

* fix matmul for generate

* fixup

* skip cache tests for now

* [run-slow]mamba2

* tune out hidden states for padding

* test batched generation

* propagate attention mask changes

* fix past length

* fix integration test

* style

* address comments

* update readme

* add mamba2 version check

* fix tests

* [run-slow]mamba2

* skip edge tests

* [run-slow]mamba2

* last fixup

* [run-slow]mamba2

* update README

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
2024-08-06 16:39:52 +02:00
..
2024-04-26 16:23:44 +01:00
2024-06-04 18:29:45 +02:00
2024-08-02 09:54:16 +05:00
2023-11-06 19:45:03 +00:00
2024-07-18 10:30:37 +05:30
2023-11-10 13:49:10 +00:00
2024-03-15 14:29:11 +01:00
2024-05-28 18:07:07 +01:00
2023-11-23 17:44:08 +00:00
2024-05-28 18:07:07 +01:00
2024-03-12 10:16:21 +00:00
2024-06-19 09:40:57 +02:00
2024-05-21 19:38:02 +02:00
2024-05-13 15:59:46 +01:00
2024-04-18 11:04:02 +02:00
2024-05-14 16:32:01 +02:00
2024-05-28 18:07:07 +01:00
2023-10-30 21:42:19 +01:00
2023-12-20 14:25:07 +05:30
2024-06-27 10:32:51 -07:00
2024-08-06 16:39:52 +02:00
2024-05-28 18:07:07 +01:00
2024-05-28 18:07:07 +01:00
2024-08-06 15:42:05 +02:00
2024-05-28 18:07:07 +01:00
2024-04-17 17:59:07 +02:00
2024-05-14 08:36:11 +02:00
2024-02-19 15:22:29 +01:00
2024-02-19 15:22:29 +01:00
2024-03-13 19:05:20 +00:00
2024-05-28 18:07:07 +01:00
2024-07-24 11:54:41 +01:00
2024-05-28 18:07:07 +01:00
2023-11-06 19:45:03 +00:00
2023-07-13 11:46:54 -04:00
2024-07-10 16:34:53 +01:00
2024-05-31 12:34:29 +02:00
2024-04-19 18:31:43 +01:00
2023-11-06 19:45:03 +00:00
2024-05-28 18:07:07 +01:00
2023-11-23 17:02:16 +00:00
2024-06-11 15:47:38 +01:00
2024-05-28 18:07:07 +01:00
2024-07-08 11:43:33 +02:00