Refactor Attention implementation for ViT-based models (#36545)

* Refactor vit attention

* Refactor ViT-based models

* 🚨🚨🚨 Fix prefix for DPT

* Update params order

* trigger tests

* Fix Dinov2 attention

* Fix DPT attention impl propagation for backbone config

* Common test fix: config is modif. inplace - avoid it

* view->reshape

* Fixup

* Fixup

* Enable IJepa FA2

* Add FA2 in corresponding model docs
This commit is contained in:
Pavel Iakubovskii
2025-03-20 15:15:01 +00:00
committed by GitHub
parent 730d2a52e7
commit 66291778dd
35 changed files with 932 additions and 975 deletions

View File

@@ -2098,7 +2098,9 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin, GenerationMixin, PushToHubMix
if not isinstance(requested_attn_implementation, dict)
else requested_attn_implementation.get(key, None)
)
sub_config._attn_implementation_internal = curr_attn_implementation
# For models with backbone sub-config might be not initialized
if sub_config is not None:
sub_config._attn_implementation_internal = curr_attn_implementation
if use_flash_attention_2:
logger.warning_once(