Arthur
accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up
* up
* test
* logits ok
* up
* up
* few fixes
* conversion script
* up
* nits
* nits
* update
* nuke
* more updates
* nites
* fix many issues
* nit
* scatter
* nit
* nuke megablocks
* nits
* fix conversion script
* nit
* remove
* nits
* nit
* update
* oupsssss
* change
* nits device
* nits
* fixup
* update
* merge
* add copied from
* fix the copy mentions
* update tests
* more fixes
* nits
* conversion script
* add parts of the readme
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* new test + conversion script
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix copies
* fix copies
* ooops
* fix config
* Apply suggestions from code review
* fix nits
* nit
* add copies
* add batched tests
* docs
* fix flash attention
* let's add more verbose
* add correct outputs
* support router ouptus
* ignore copies where needed
* fix
* cat list if list is given for now
* nits
* Update docs/source/en/model_doc/mixtral.md
* finish router refactoring
* fix forward
* fix expected values
* nits
* fixup
* fix
* fix bug
* fix
* fix dtype mismatch
* fix
* grrr grrr I support item assignment
* fix CI
* docs
* fixup
* remove some copied form
* fix weird diff
* skip doctest fast on the config and modeling
* mark that is supports flash attention in the doc
* update
* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert router logits config issue
* update doc accordingly
* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
* nits
* use torch testing asssert close
* fixup
* doc nits
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
..
2022-02-23 15:46:28 -05:00
2023-10-09 11:04:57 +02:00
2023-11-09 12:34:13 +01:00
2023-10-30 18:16:40 +00:00
2023-02-03 12:43:46 -05:00
2023-11-01 07:17:06 +01:00
2023-12-08 14:14:16 +01:00
2023-12-11 12:50:27 +01:00
2023-03-02 12:08:43 -05:00
2023-11-14 10:32:57 +01:00
2023-12-08 14:55:29 +01:00
2023-12-05 12:14:45 +01:00
2023-12-07 10:00:08 +01:00
2023-11-17 13:44:09 -08:00
2023-11-16 17:43:19 +01:00
2023-06-26 09:58:14 -04:00
2023-12-08 11:51:02 -05:00
2023-12-09 05:38:14 +09:00
2020-01-06 15:11:12 +01:00
2023-09-18 23:47:54 +02:00
2023-12-08 20:02:20 +00:00
2023-11-15 14:10:39 +01:00
2023-12-09 05:38:14 +09:00
2023-06-15 07:30:24 -04:00
2023-11-13 14:20:54 +01:00
2023-08-16 17:45:02 +01:00
2023-11-13 14:20:54 +01:00
2023-11-10 15:35:27 +00:00
2023-12-11 12:38:17 +01:00
2023-06-15 07:30:24 -04:00
2023-11-13 15:17:01 +01:00
2023-11-24 11:48:02 +01:00
2023-11-13 15:17:01 +01:00
2023-12-09 05:38:14 +09:00
2023-10-31 14:20:04 +00:00
2023-09-05 10:12:25 +02:00
2023-11-16 17:43:19 +01:00
2023-11-13 14:20:54 +01:00