* Add OLMoE

* Add OLMoE

* Updates

* Make norm optional; add keys

* Add output

* Add

* Fix dtype

* Fix eos config

* Update

* Add OLMoE

* Fix OLMoE path

* Format

* Format

* Rmv copy statement

* Rmv copy statement

* Format

* Add copies

* Cp rotary

* Fix aming

* Fix naming

* Update RoPE integration; num_logits_to_keep; Add copy statements

* Add eps to config

* Format

* Add aux loss

* Adapt router_aux_loss_coef

* Update md

* Adapt

* adapt tests
This commit is contained in:
Niklas Muennighoff
2024-09-03 09:43:12 -07:00
committed by GitHub
parent d6534f996b
commit ecd61c6286
16 changed files with 2442 additions and 0 deletions

View File

@@ -71,6 +71,7 @@ FlashAttention-2 is currently supported for the following architectures:
* [Nemotron](https://huggingface.co/docs/transformers/model_doc/nemotron)
* [NLLB](https://huggingface.co/docs/transformers/model_doc/nllb)
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
* [OPT](https://huggingface.co/docs/transformers/model_doc/opt#transformers.OPTModel)
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)
@@ -230,6 +231,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [Musicgen](https://huggingface.co/docs/transformers/model_doc/musicgen#transformers.MusicgenModel)
* [MusicGen Melody](https://huggingface.co/docs/transformers/model_doc/musicgen_melody#transformers.MusicgenMelodyModel)
* [OLMo](https://huggingface.co/docs/transformers/model_doc/olmo#transformers.OlmoModel)
* [OLMoE](https://huggingface.co/docs/transformers/model_doc/olmoe#transformers.OlmoeModel)
* [PaliGemma](https://huggingface.co/docs/transformers/model_doc/paligemma#transformers.PaliGemmaForConditionalGeneration)
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)