Arthur
accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up
* up
* test
* logits ok
* up
* up
* few fixes
* conversion script
* up
* nits
* nits
* update
* nuke
* more updates
* nites
* fix many issues
* nit
* scatter
* nit
* nuke megablocks
* nits
* fix conversion script
* nit
* remove
* nits
* nit
* update
* oupsssss
* change
* nits device
* nits
* fixup
* update
* merge
* add copied from
* fix the copy mentions
* update tests
* more fixes
* nits
* conversion script
* add parts of the readme
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* new test + conversion script
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix copies
* fix copies
* ooops
* fix config
* Apply suggestions from code review
* fix nits
* nit
* add copies
* add batched tests
* docs
* fix flash attention
* let's add more verbose
* add correct outputs
* support router ouptus
* ignore copies where needed
* fix
* cat list if list is given for now
* nits
* Update docs/source/en/model_doc/mixtral.md
* finish router refactoring
* fix forward
* fix expected values
* nits
* fixup
* fix
* fix bug
* fix
* fix dtype mismatch
* fix
* grrr grrr I support item assignment
* fix CI
* docs
* fixup
* remove some copied form
* fix weird diff
* skip doctest fast on the config and modeling
* mark that is supports flash attention in the doc
* update
* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert router logits config issue
* update doc accordingly
* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
* nits
* use torch testing asssert close
* fixup
* doc nits
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
..
2022-11-08 19:54:41 +00:00
2021-02-15 07:55:10 -05:00
2023-04-06 18:08:14 +02:00
2023-05-18 14:14:43 -04:00
2023-11-30 20:24:43 +01:00
2023-09-18 19:58:01 +02:00
2023-12-07 10:00:08 +01:00
2023-08-10 10:53:22 +02:00
2023-12-07 07:05:00 +01:00
2023-11-03 12:47:07 +01:00
2023-08-10 10:53:22 +02:00
2023-12-01 15:51:10 +01:00
2023-03-13 19:11:19 +01:00
2023-12-05 15:31:35 +01:00
2023-06-06 18:17:41 +02:00
2023-12-07 09:30:47 +01:00
2023-08-17 07:58:35 +02:00
2021-02-15 07:55:10 -05:00
2023-11-28 10:05:34 +01:00
2023-08-17 07:58:35 +02:00
2021-10-07 12:44:23 +05:30
2023-02-28 17:12:44 +01:00
2023-04-19 19:27:37 +02:00
2023-02-28 17:12:44 +01:00
2023-02-03 12:57:02 -05:00
2023-04-21 20:36:35 +02:00
2023-03-01 17:53:29 +01:00
2023-12-11 12:50:27 +01:00
2023-10-30 10:48:24 +01:00
2023-12-08 14:55:02 +01:00
2023-03-30 21:06:35 +02:00
2022-06-02 10:24:16 +02:00
2023-08-17 07:58:35 +02:00
2023-11-30 20:24:43 +01:00
2023-08-17 07:58:35 +02:00
2023-11-28 17:21:21 +01:00
2023-11-14 20:05:54 +00:00
2023-04-06 22:52:59 +02:00