Arthur
accccdd008
[Add Mixtral] Adds support for the Mixtral MoE (#27942)
* up
* up
* test
* logits ok
* up
* up
* few fixes
* conversion script
* up
* nits
* nits
* update
* nuke
* more updates
* nites
* fix many issues
* nit
* scatter
* nit
* nuke megablocks
* nits
* fix conversion script
* nit
* remove
* nits
* nit
* update
* oupsssss
* change
* nits device
* nits
* fixup
* update
* merge
* add copied from
* fix the copy mentions
* update tests
* more fixes
* nits
* conversion script
* add parts of the readme
* Update tests/models/mixtral/test_modeling_mixtral.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* new test + conversion script
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Apply suggestions from code review
* fix
* fix copies
* fix copies
* ooops
* fix config
* Apply suggestions from code review
* fix nits
* nit
* add copies
* add batched tests
* docs
* fix flash attention
* let's add more verbose
* add correct outputs
* support router ouptus
* ignore copies where needed
* fix
* cat list if list is given for now
* nits
* Update docs/source/en/model_doc/mixtral.md
* finish router refactoring
* fix forward
* fix expected values
* nits
* fixup
* fix
* fix bug
* fix
* fix dtype mismatch
* fix
* grrr grrr I support item assignment
* fix CI
* docs
* fixup
* remove some copied form
* fix weird diff
* skip doctest fast on the config and modeling
* mark that is supports flash attention in the doc
* update
* Update src/transformers/models/mixtral/modeling_mixtral.py
Co-authored-by: Lysandre Debut <hi@lysand.re>
* Update docs/source/en/model_doc/mixtral.md
Co-authored-by: Lysandre Debut <hi@lysand.re>
* revert router logits config issue
* update doc accordingly
* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py
* nits
* use torch testing asssert close
* fixup
* doc nits
---------
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
..
2023-06-20 18:07:47 -04:00
2023-06-20 18:07:47 -04:00
2023-06-20 18:07:47 -04:00
2023-12-04 14:10:54 -08:00
2023-06-20 18:07:47 -04:00
2023-07-24 15:34:19 +01:00
2023-10-16 15:12:03 +02:00
2023-10-18 04:42:32 -07:00
2023-12-11 12:50:27 +01:00
2023-11-17 08:20:42 -08:00
2023-06-20 18:07:47 -04:00
2023-07-10 10:50:43 +01:00
2023-09-22 10:29:27 -04:00
2023-10-12 08:48:01 -04:00
2023-07-25 14:32:40 +02:00
2023-12-07 10:47:35 -08:00
2023-12-11 12:50:27 +01:00
2023-12-04 14:10:54 -08:00
2023-11-25 00:59:17 +09:00
2023-11-10 15:28:30 +00:00
2023-12-04 14:10:54 -08:00
2023-07-11 14:04:04 +01:00
2023-08-09 08:29:06 -04:00
2023-09-19 00:45:12 +02:00
2023-06-20 18:07:47 -04:00