PhiMoE (#33363)
* onboard phimoe model * removed debug code * added unit tests * updated docs * formatted * fixed unit tests * fixed test case * fixed format * refactored code * fixed expected outputs in the integration tests * Added a warning msg * Addressed comments * Addressed comments * fixed test cases * added paper link * Addressed comments * Refactored PhimoeForCausalLM forward fn * Refactored PhimoeRotaryEmbedding class * fixed test cases * fixed testcase * fixed test case * Addressed comments * fixed test cases * fixed testcases * Used cache position instead to get the seq len
This commit is contained in:
@@ -79,6 +79,7 @@ FlashAttention-2 is currently supported for the following architectures:
|
||||
* [OPT](https://huggingface.co/docs/transformers/model_doc/opt#transformers.OPTModel)
|
||||
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
|
||||
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)
|
||||
* [PhiMoE](https://huggingface.co/docs/transformers/model_doc/phimoe#transformers.PhimoeModel)
|
||||
* [StableLm](https://huggingface.co/docs/transformers/model_doc/stablelm#transformers.StableLmModel)
|
||||
* [Starcoder2](https://huggingface.co/docs/transformers/model_doc/starcoder2#transformers.Starcoder2Model)
|
||||
* [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model)
|
||||
@@ -248,6 +249,7 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [PaliGemma](https://huggingface.co/docs/transformers/model_doc/paligemma#transformers.PaliGemmaForConditionalGeneration)
|
||||
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
|
||||
* [Phi3](https://huggingface.co/docs/transformers/model_doc/phi3#transformers.Phi3Model)
|
||||
* [PhiMoE](https://huggingface.co/docs/transformers/model_doc/phimoe#transformers.PhimoeModel)
|
||||
* [Idefics](https://huggingface.co/docs/transformers/model_doc/idefics#transformers.IdeficsModel)
|
||||
* [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel)
|
||||
* [mBart](https://huggingface.co/docs/transformers/model_doc/mbart#transformers.MBartModel)
|
||||
|
||||
Reference in New Issue
Block a user