Sdpa dino v2 (#33403)

* add sdpa to dinov2

* fixup

* add dinov2 to sdpa doc

* update doc order

* [run-slow] dinov2

* common to eager

* [run-slow] dinov2

* update attn implementation in common

* update test_modeling_dinov2 to have mask_ration, num_masks and mask_length similar to vit

* [run-slow] dinov2

---------

Co-authored-by: Avishai Elmakies <avishai.elma@cs.huji.ac.il>
This commit is contained in:
Avishai Elmakies
2024-09-21 03:58:00 +03:00
committed by GitHub
parent e71bf70e33
commit 78b2929c05
3 changed files with 55 additions and 2 deletions

View File

@@ -217,6 +217,7 @@ For now, Transformers supports SDPA inference and training for the following arc
* [data2vec_audio](https://huggingface.co/docs/transformers/main/en/model_doc/data2vec#transformers.Data2VecAudioModel)
* [Dbrx](https://huggingface.co/docs/transformers/model_doc/dbrx#transformers.DbrxModel)
* [DeiT](https://huggingface.co/docs/transformers/model_doc/deit#transformers.DeiTModel)
* [Dinov2](https://huggingface.co/docs/transformers/en/model_doc/dinov2)
* [Dpr](https://huggingface.co/docs/transformers/model_doc/dpr#transformers.DprReader)
* [Falcon](https://huggingface.co/docs/transformers/model_doc/falcon#transformers.FalconModel)
* [Gemma](https://huggingface.co/docs/transformers/model_doc/gemma#transformers.GemmaModel)
@@ -275,7 +276,6 @@ For now, Transformers supports SDPA inference and training for the following arc
* [XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl#transformers.XLMRobertaXLModel)
* [YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos#transformers.YolosModel)
<Tip>
FlashAttention can only be used for models with the `fp16` or `bf16` torch type, so make sure to cast your model to the appropriate type first. The memory-efficient attention backend is able to handle `fp32` models.