add shared experts for upcoming Granite 4.0 language models (#35894)
* Modular GraniteMoE with shared Experts. Signed-off-by: Shawn Tan <shawntan@ibm.com> * Modified * Import order. * Modified for style * Fix space. * Test * Remove extra granitemoe file. * New converted file and tests * Modified __init__ files. * Formatting. * Dummy PT objects * register granitemoe shared model Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix linting of a file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix import in modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * add documentation Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update docstrings Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * update generated modeling file Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * fix docstrings in config class Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> * merge main Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> --------- Signed-off-by: Shawn Tan <shawntan@ibm.com> Signed-off-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Shawn Tan <shawntan@ibm.com> Co-authored-by: Shawn Tan <shawn@wtf.sg> Co-authored-by: Sukriti-Sharma4 <sukriti.sharma4@ibm.com> Co-authored-by: Sukriti Sharma <Ssukriti@users.noreply.github.com>
This commit is contained in:
@@ -60,6 +60,7 @@ FlashAttention-2 is currently supported for the following architectures:
|
||||
* [GPT-J](https://huggingface.co/docs/transformers/model_doc/gptj#transformers.GPTJModel)
|
||||
* [Granite](https://huggingface.co/docs/transformers/model_doc/granite#transformers.GraniteModel)
|
||||
* [GraniteMoe](https://huggingface.co/docs/transformers/model_doc/granitemoe#transformers.GraniteMoeModel)
|
||||
* [GraniteMoeShared](https://huggingface.co/docs/transformers/model_doc/granitemoeshared#transformers.GraniteMoeSharedModel)
|
||||
* [Idefics2](https://huggingface.co/docs/transformers/model_doc/idefics2#transformers.Idefics2Model)
|
||||
* [Idefics3](https://huggingface.co/docs/transformers/model_doc/idefics3#transformers.Idefics3Model)
|
||||
* [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)
|
||||
@@ -266,6 +267,7 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [Idefics3](https://huggingface.co/docs/transformers/model_doc/idefics3#transformers.Idefics3Model)
|
||||
* [I-JEPA](https://huggingface.co/docs/transformers/model_doc/ijepa#transformers.IJepaModel)
|
||||
* [GraniteMoe](https://huggingface.co/docs/transformers/model_doc/granitemoe#transformers.GraniteMoeModel)
|
||||
* [GraniteMoeShared](https://huggingface.co/docs/transformers/model_doc/granitemoeshared#transformers.GraniteMoeSharedModel)
|
||||
* [JetMoe](https://huggingface.co/docs/transformers/model_doc/jetmoe#transformers.JetMoeModel)
|
||||
* [Jamba](https://huggingface.co/docs/transformers/model_doc/jamba#transformers.JambaModel)
|
||||
* [Llama](https://huggingface.co/docs/transformers/model_doc/llama#transformers.LlamaModel)
|
||||
|
||||
Reference in New Issue
Block a user