Pablo Montalvo
80b90e7b2f
Add codestral mamba2 ( #32080 )
...
* add new model like
* draft cuda forward - mismatched keys (sharding on conv1)
* match keys successfully
* fix split
* get generation/forward running (wrong gens, norm?)
* :update
* some refactoring
* fixes
* works up until copy to cache
* fix
* update
* NON WORKING VERSION
* version that work?
* nit
* fix config
* fix conversion script
* working cuda forward
* nit
* update
* simplifcation
* make mamba slow simple work
* no einops
* todo
* fix style
* no einops
* update fix no einsum
* nit
* remove einops
* bug: scan_output differs strongly
* add rms norm option
* fix fast + slow generation with and w/o cache ✔️
* draft integration tests
* remove a big chunk of the einsum
* fix slow, fast generations, without any einsum
* fix copies
* fix structure
* fix up modeling and tests
* fix tests
* clamping is indeed worse
* recover mamba2 cache test
* fix copies
* no cache position (yet)
* fix tf tests
* fix matmul for generate
* fixup
* skip cache tests for now
* [run-slow]mamba2
* tune out hidden states for padding
* test batched generation
* propagate attention mask changes
* fix past length
* fix integration test
* style
* address comments
* update readme
* add mamba2 version check
* fix tests
* [run-slow]mamba2
* skip edge tests
* [run-slow]mamba2
* last fixup
* [run-slow]mamba2
* update README
---------
Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com >
2024-08-06 16:39:52 +02:00
..
2023-11-16 11:44:36 -08:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-05-16 10:56:11 +01:00
2024-03-20 15:41:03 +00:00
2023-11-03 10:57:03 -04:00
2023-12-09 05:38:14 +09:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-02-16 08:16:58 +01:00
2023-11-03 10:57:03 -04:00
2024-04-26 16:23:44 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-12-06 10:38:21 -08:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-06-04 18:29:45 +02:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-01-29 15:46:32 +01:00
2023-11-03 10:57:03 -04:00
2024-08-02 09:54:16 +05:00
2023-11-03 10:57:03 -04:00
2023-11-06 19:45:03 +00:00
2024-07-18 10:30:37 +05:30
2023-11-03 10:57:03 -04:00
2023-11-10 13:49:10 +00:00
2024-05-09 22:57:52 +02:00
2024-04-17 12:19:18 +02:00
2024-03-15 14:29:11 +01:00
2024-05-08 11:42:07 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-05-27 14:57:43 +02:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-05-08 11:42:07 +01:00
2024-05-16 10:56:11 +01:00
2023-11-03 10:57:03 -04:00
2024-07-05 19:28:41 +01:00
2024-07-05 19:28:41 +01:00
2024-05-28 18:07:07 +01:00
2024-05-08 11:42:07 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-07-19 11:20:03 +03:00
2024-02-16 08:16:58 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-23 17:44:08 +00:00
2024-05-28 18:07:07 +01:00
2023-06-20 18:07:47 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-02-16 08:16:58 +01:00
2024-05-28 18:07:07 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-09-04 17:18:34 +01:00
2024-03-29 14:31:31 +00:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-03-12 10:16:21 +00:00
2024-07-02 23:04:53 +01:00
2024-05-20 10:06:57 +02:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-06-19 09:40:57 +02:00
2024-02-16 08:16:58 +01:00
2023-12-09 05:38:14 +09:00
2023-11-03 10:57:03 -04:00
2024-06-26 13:56:36 +01:00
2024-01-15 09:09:22 +01:00
2023-11-03 10:57:03 -04:00
2024-05-28 18:07:07 +01:00
2024-05-28 18:07:07 +01:00
2024-07-25 19:20:47 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-07-19 11:20:03 +03:00
2024-04-22 18:30:38 +01:00
2023-11-03 10:57:03 -04:00
2024-05-21 19:38:02 +02:00
2024-05-13 15:59:46 +01:00
2023-11-03 10:57:03 -04:00
2024-02-02 08:45:00 +01:00
2024-06-25 15:45:39 +05:00
2024-06-25 15:45:39 +05:00
2024-04-18 11:04:02 +02:00
2024-05-14 16:32:01 +02:00
2024-05-28 18:07:07 +01:00
2023-10-30 21:42:19 +01:00
2023-11-03 10:57:03 -04:00
2024-02-12 10:48:31 -08:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-02-12 10:48:31 -08:00
2023-12-20 14:25:07 +05:30
2024-06-27 10:32:51 -07:00
2024-05-20 10:06:57 +02:00
2024-08-05 15:14:50 -07:00
2024-07-23 10:23:55 +05:00
2024-07-19 10:08:56 +05:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-04-18 10:27:58 +02:00
2023-11-28 13:19:50 +00:00
2024-08-06 16:39:52 +02:00
2024-03-11 09:46:24 +01:00
2024-07-22 14:14:47 +01:00
2024-02-02 08:45:00 +01:00
2024-05-31 16:56:17 +01:00
2024-05-31 16:56:17 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-05-28 18:07:07 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-02-08 14:13:35 -08:00
2024-05-23 17:48:49 +01:00
2024-05-20 10:06:57 +02:00
2023-11-03 10:57:03 -04:00
2024-02-02 08:45:00 +01:00
2023-11-06 19:45:03 +00:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-02-01 03:53:49 +01:00
2024-03-18 13:06:12 +00:00
2024-02-12 10:48:31 -08:00
2023-11-03 10:57:03 -04:00
2024-05-28 18:07:07 +01:00
2024-08-06 15:42:05 +02:00
2024-05-28 18:07:07 +01:00
2023-11-06 19:45:03 +00:00
2024-04-18 10:27:58 +02:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-04-17 17:59:07 +02:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-12-09 05:38:14 +09:00
2024-05-14 08:36:11 +02:00
2023-12-11 18:03:42 +00:00
2024-05-28 13:22:06 +02:00
2024-02-19 15:22:29 +01:00
2024-02-19 15:22:29 +01:00
2024-02-08 14:13:35 -08:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-05-20 10:06:57 +02:00
2024-05-02 15:30:21 +02:00
2024-04-15 14:10:59 +02:00
2023-11-03 10:57:03 -04:00
2024-02-26 08:42:24 -08:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-02-12 10:48:31 -08:00
2024-07-09 10:38:29 +01:00
2024-03-13 19:05:20 +00:00
2024-02-08 14:13:35 -08:00
2024-05-28 18:07:07 +01:00
2024-05-20 10:06:57 +02:00
2024-07-24 11:54:41 +01:00
2023-11-03 10:57:03 -04:00
2024-05-28 18:07:07 +01:00
2024-04-15 18:30:59 +02:00
2024-02-02 08:45:00 +01:00
2023-11-06 19:45:03 +00:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-07-13 11:46:54 -04:00
2023-11-06 19:45:03 +00:00
2024-07-22 10:08:27 -07:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-07-10 16:34:53 +01:00
2024-02-02 08:45:00 +01:00
2024-05-31 12:34:29 +02:00
2023-12-14 15:14:13 +00:00
2023-12-14 15:14:13 +00:00
2024-06-10 12:35:10 +01:00
2024-04-26 19:40:12 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-07-08 11:10:02 +01:00
2024-05-28 18:07:07 +01:00
2023-11-03 10:57:03 -04:00
2024-02-16 08:16:58 +01:00
2023-08-03 14:12:07 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-05-20 10:06:57 +02:00
2024-05-20 10:06:57 +02:00
2024-05-28 13:22:06 +02:00
2024-04-19 18:31:43 +01:00
2023-06-20 18:07:47 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-06 19:45:03 +00:00
2024-04-15 14:10:59 +02:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-06 19:45:03 +00:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-02-16 08:16:58 +01:00
2023-11-03 10:57:03 -04:00
2024-05-28 18:07:07 +01:00
2023-11-23 17:02:16 +00:00
2024-04-10 16:02:50 +02:00
2023-11-03 10:57:03 -04:00
2024-02-02 08:45:00 +01:00
2024-02-02 08:45:00 +01:00
2023-11-03 10:57:03 -04:00
2023-11-22 17:21:36 +01:00
2023-11-03 10:57:03 -04:00
2024-02-02 08:45:00 +01:00
2024-07-19 15:38:01 +05:00
2024-05-16 10:56:11 +01:00
2023-11-03 10:57:03 -04:00
2024-07-19 10:08:56 +05:00
2024-02-16 08:16:58 +01:00
2023-11-03 10:57:03 -04:00
2024-02-16 08:16:58 +01:00
2024-05-28 18:07:07 +01:00
2024-05-16 10:56:11 +01:00
2024-05-16 10:56:11 +01:00
2024-06-11 15:47:38 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-01-18 13:37:34 +00:00
2024-06-05 11:56:11 +01:00
2024-06-05 11:56:11 +01:00
2024-02-02 08:45:00 +01:00
2024-08-01 18:10:56 +08:00
2024-05-28 18:07:07 +01:00
2023-11-03 10:57:03 -04:00
2024-05-28 18:07:07 +01:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2023-11-03 10:57:03 -04:00
2024-06-05 11:56:11 +01:00
2023-11-03 10:57:03 -04:00
2024-05-16 10:56:11 +01:00
2023-11-03 10:57:03 -04:00
2024-07-08 11:43:33 +02:00