Starcoder2 model - bis (#29215)
* Copy model * changes * misc * fixes * add embed and residual dropout (#30) * misc * remove rms norm and gated MLP * remove copied mentions where its not a copy anymore * remove unused _shape * copied from mistral instead * fix copies * fix copies * add not doctested * fix * fix copyright * Update docs/source/en/model_doc/starcoder2.md Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/starcoder2/configuration_starcoder2.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix doc * revert some changes * add fa2 tests * fix styling nit * fix * push dummy docs --------- Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
@@ -54,6 +54,7 @@ FlashAttention-2 is currently supported for the following architectures:
|
||||
* [OPT](https://huggingface.co/docs/transformers/model_doc/opt#transformers.OPTModel)
|
||||
* [Phi](https://huggingface.co/docs/transformers/model_doc/phi#transformers.PhiModel)
|
||||
* [StableLm](https://huggingface.co/docs/transformers/model_doc/stablelm#transformers.StableLmModel)
|
||||
* [Starcoder2](https://huggingface.co/docs/transformers/model_doc/starcoder2#transformers.Starcoder2Model)
|
||||
* [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model)
|
||||
* [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper#transformers.WhisperModel)
|
||||
|
||||
@@ -180,6 +181,7 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [Mistral](https://huggingface.co/docs/transformers/model_doc/mistral#transformers.MistralModel)
|
||||
* [Mixtral](https://huggingface.co/docs/transformers/model_doc/mixtral#transformers.MixtralModel)
|
||||
* [StableLm](https://huggingface.co/docs/transformers/model_doc/stablelm#transformers.StableLmModel)
|
||||
* [Starcoder2](https://huggingface.co/docs/transformers/model_doc/starcoder2#transformers.Starcoder2Model)
|
||||
* [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model)
|
||||
|
||||
<Tip>
|
||||
|
||||
Reference in New Issue
Block a user