add qwen2.5vl (#35569)
* add qwen2.5vl * fix * pass check table * add modular file * fix style * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modeling_qwen2_5_vl.py Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> * padd copy check * use modular * fix * fix * fix * update flashatt2&sdpa support_list * Update docs/source/en/_toctree.yml Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/model_doc/qwen2_5_vl.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * update config * update * fix hf path * rename Qwen2_5_VLVideosKwargs * fix * fix * update * excuted modular * rollback init * fix * formated * simpler init * fix * fix * fix * fix * fix * update docs * fix * fix * update Qwen2VLRotaryEmbedding for yarn * fix --------- Co-authored-by: Minho Shim <6764739+minostauros@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: gewenbin0992 <gewenbin292@163.com> Co-authored-by: gewenbin0992 <67409248+gewenbin0992@users.noreply.github.com>
This commit is contained in:
@@ -97,6 +97,7 @@ FlashAttention-2 is currently supported for the following architectures:
|
||||
* [Qwen2Audio](https://huggingface.co/docs/transformers/model_doc/qwen2_audio#transformers.Qwen2AudioEncoder)
|
||||
* [Qwen2MoE](https://huggingface.co/docs/transformers/model_doc/qwen2_moe#transformers.Qwen2MoeModel)
|
||||
* [Qwen2VL](https://huggingface.co/docs/transformers/model_doc/qwen2_vl#transformers.Qwen2VLModel)
|
||||
* [Qwen2.5VL](https://huggingface.co/docs/transformers/model_doc/qwen2_5_vl#transformers.Qwen2_5_VLModel)
|
||||
* [RAG](https://huggingface.co/docs/transformers/model_doc/rag#transformers.RagModel)
|
||||
* [SpeechEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/speech_encoder_decoder#transformers.SpeechEncoderDecoderModel)
|
||||
* [VisionEncoderDecoder](https://huggingface.co/docs/transformers/model_doc/vision_encoder_decoder#transformers.VisionEncoderDecoderModel)
|
||||
@@ -297,6 +298,7 @@ For now, Transformers supports SDPA inference and training for the following arc
|
||||
* [Qwen2](https://huggingface.co/docs/transformers/model_doc/qwen2#transformers.Qwen2Model)
|
||||
* [Qwen2Audio](https://huggingface.co/docs/transformers/model_doc/qwen2_audio#transformers.Qwen2AudioEncoder)
|
||||
* [Qwen2MoE](https://huggingface.co/docs/transformers/model_doc/qwen2_moe#transformers.Qwen2MoeModel)
|
||||
* [Qwen2.5VL](https://huggingface.co/docs/transformers/model_doc/qwen2_5_vl#transformers.Qwen2_5_VLModel)
|
||||
* [RoBERTa](https://huggingface.co/docs/transformers/model_doc/roberta#transformers.RobertaModel)
|
||||
* [Sew](https://huggingface.co/docs/transformers/main/en/model_doc/sew#transformers.SEWModel)
|
||||
* [SigLIP](https://huggingface.co/docs/transformers/model_doc/siglip)
|
||||
|
||||
Reference in New Issue
Block a user