Arthur
accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up
* up
* test
* logits ok
* up
* up
* few fixes
* conversion script
* up
* nits
* nits
* update
* nuke
* more updates
* nites
* fix many issues
* nit
* scatter
* nit
* nuke megablocks
* nits
* fix conversion script
* nit
* remove
* nits
* nit
* update
* oupsssss
* change
* nits device
* nits
* fixup
* update
* merge
* add copied from
* fix the copy mentions
* update tests
* more fixes
* nits
* conversion script
* add parts of the readme
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* new test + conversion script
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix copies
* fix copies
* ooops
* fix config
* Apply suggestions from code review
* fix nits
* nit
* add copies
* add batched tests
* docs
* fix flash attention
* let's add more verbose
* add correct outputs
* support router ouptus
* ignore copies where needed
* fix
* cat list if list is given for now
* nits
* Update docs/source/en/model_doc/mixtral.md
* finish router refactoring
* fix forward
* fix expected values
* nits
* fixup
* fix
* fix bug
* fix
* fix dtype mismatch
* fix
* grrr grrr I support item assignment
* fix CI
* docs
* fixup
* remove some copied form
* fix weird diff
* skip doctest fast on the config and modeling
* mark that is supports flash attention in the doc
* update
* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert router logits config issue
* update doc accordingly
* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
* nits
* use torch testing asssert close
* fixup
* doc nits
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
..
2023-11-16 17:43:19 +01:00
2023-10-30 19:53:46 +01:00
2023-11-16 17:43:19 +01:00
2023-11-08 07:39:37 +00:00
2023-11-13 14:20:54 +01:00
2023-10-30 19:53:46 +01:00
2023-12-09 05:38:14 +09:00
2023-10-31 18:12:14 +01:00
2023-11-16 17:43:19 +01:00
2023-02-06 18:10:56 -05:00
2023-11-28 08:38:32 +00:00
2023-10-31 18:12:14 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-02-06 18:10:56 -05:00
2023-11-16 17:43:19 +01:00
2023-10-31 18:12:14 +01:00
2023-10-11 14:12:09 +02:00
2023-11-27 15:48:17 +01:00
2023-10-31 18:12:14 +01:00
2023-10-31 18:12:14 +01:00
2023-10-31 18:12:14 +01:00
2023-10-31 18:12:14 +01:00
2023-11-16 17:43:19 +01:00
2023-08-11 11:30:18 +01:00
2023-09-25 18:08:12 +02:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-10-30 19:53:46 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-28 17:40:01 +01:00
2023-11-16 17:43:19 +01:00
2023-10-24 16:49:26 +02:00
2023-08-11 11:30:18 +01:00
2023-10-31 18:12:14 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-03-22 20:02:24 +01:00
2023-08-02 20:22:36 +02:00
2023-10-31 18:12:14 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-02-28 19:40:57 +01:00
2023-10-31 18:12:14 +01:00
2023-11-27 15:48:17 +01:00
2023-12-11 07:57:30 +01:00
2023-09-05 11:19:56 +02:00
2023-11-27 15:48:17 +01:00
2023-12-08 14:54:32 +01:00
2023-12-09 05:38:14 +09:00
2023-06-29 10:17:36 +01:00
2023-11-27 15:48:17 +01:00
2023-08-02 20:22:36 +02:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-08-02 20:22:36 +02:00
2023-10-04 17:09:48 +02:00
2023-11-16 17:43:19 +01:00
2023-10-31 18:12:14 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-12-09 05:38:14 +09:00
2023-10-31 18:12:14 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-27 15:48:17 +01:00
2023-10-31 18:12:14 +01:00
2023-06-16 15:40:49 +01:00
2023-11-23 21:00:39 +01:00
2023-10-30 19:53:46 +01:00
2023-11-27 15:48:17 +01:00
2023-11-16 17:43:19 +01:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-07-24 10:08:47 -04:00
2023-10-30 19:53:46 +01:00
2023-10-18 16:30:53 +02:00
2023-10-05 10:27:05 +02:00
2023-11-08 20:58:36 +01:00
2023-12-09 05:38:14 +09:00
2023-10-30 19:53:46 +01:00
2023-10-30 19:53:46 +01:00
2023-11-16 17:43:19 +01:00
2023-10-31 18:12:14 +01:00
2023-12-05 10:45:39 +01:00
2023-10-30 19:53:46 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-10-31 18:12:14 +01:00
2023-11-27 15:48:17 +01:00
2023-10-30 19:53:46 +01:00
2023-12-09 05:38:14 +09:00
2023-12-07 09:30:47 +01:00
2023-12-07 10:00:08 +01:00
2023-08-02 20:22:36 +02:00
2023-10-30 19:53:46 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-10-31 18:12:14 +01:00
2023-11-16 17:43:19 +01:00
2023-10-31 18:12:14 +01:00
2023-08-02 20:22:36 +02:00
2022-05-03 14:42:02 +02:00
2023-11-27 15:48:17 +01:00
2023-12-09 05:38:14 +09:00
2023-12-11 12:50:27 +01:00
2023-09-18 20:28:36 +02:00
2023-12-07 14:28:53 +00:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-12-07 14:28:53 +00:00
2023-08-21 11:11:21 +02:00
2023-10-30 19:53:46 +01:00
2023-05-24 13:52:52 +01:00
2023-11-08 13:26:02 +00:00
2023-11-16 17:43:19 +01:00
2023-11-27 15:48:17 +01:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-11-27 14:23:54 +00:00
2023-09-26 07:06:04 +02:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-08-02 20:22:36 +02:00
2023-10-31 18:12:14 +01:00
2023-12-04 09:45:22 +01:00
2023-11-16 17:43:19 +01:00
2023-12-05 15:31:35 +01:00
2023-11-29 13:36:38 +01:00
2023-11-16 17:43:19 +01:00
2023-10-31 18:12:14 +01:00
2023-11-16 17:43:19 +01:00
2023-12-07 21:34:43 +09:00
2023-12-09 05:38:14 +09:00
2023-02-06 18:10:56 -05:00
2023-10-30 19:53:46 +01:00
2023-10-31 18:12:14 +01:00
2023-11-27 15:48:17 +01:00
2023-09-25 18:08:12 +02:00
2023-08-02 20:22:36 +02:00
2023-11-27 15:48:17 +01:00
2023-08-02 20:22:36 +02:00
2023-12-04 16:48:37 +01:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-11-27 15:48:17 +01:00
2023-12-04 13:36:57 +01:00
2023-11-27 15:48:17 +01:00
2023-10-11 14:12:09 +02:00
2023-11-16 17:43:19 +01:00
2023-10-11 14:12:09 +02:00
2023-10-30 19:53:46 +01:00
2023-08-02 20:22:36 +02:00
2023-11-27 15:48:17 +01:00
2023-11-30 20:24:43 +01:00
2023-12-11 09:18:41 +01:00
2023-11-27 15:48:17 +01:00
2023-09-05 10:12:25 +02:00
2023-09-05 10:12:25 +02:00
2023-02-06 18:10:56 -05:00
2023-11-16 17:43:19 +01:00
2023-12-07 14:28:53 +00:00
2023-11-16 17:43:19 +01:00
2023-08-02 20:22:36 +02:00
2023-08-02 20:22:36 +02:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 14:23:54 +00:00
2023-11-27 14:23:54 +00:00
2023-07-25 07:56:04 -04:00
2023-11-16 17:43:19 +01:00
2023-10-30 19:53:46 +01:00
2023-11-27 15:48:17 +01:00
2023-09-18 23:47:54 +02:00
2023-08-02 20:22:36 +02:00
2023-11-09 18:35:42 +00:00
2023-12-04 09:40:42 +01:00
2023-10-30 19:53:46 +01:00
2023-10-03 10:52:34 +02:00
2023-10-03 10:52:34 +02:00
2023-12-11 09:17:37 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-10-30 19:53:46 +01:00
2023-12-08 14:14:16 +01:00
2023-11-16 17:43:19 +01:00
2023-10-30 19:53:46 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-11-27 15:48:17 +01:00
2023-12-08 14:54:32 +01:00
2023-11-27 15:48:17 +01:00
2023-10-03 19:43:42 +02:00
2023-08-16 17:45:02 +01:00
2023-11-16 17:43:19 +01:00
2023-10-31 18:12:14 +01:00
2022-05-03 14:42:02 +02:00
2023-10-30 15:27:15 +01:00
2023-10-03 10:52:34 +02:00
2023-12-09 05:38:14 +09:00
2023-10-30 19:53:46 +01:00
2023-12-07 14:28:53 +00:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-11-16 17:43:19 +01:00
2023-08-02 20:22:36 +02:00
2023-11-16 17:43:19 +01:00
2023-08-02 20:22:36 +02:00
2023-11-27 15:48:17 +01:00
2023-08-02 20:22:36 +02:00
2022-05-03 14:42:02 +02:00