[VLMs] support attention backends (#37576)

* update models

* why rename

* return attn weights when sdpa

* fixes

* fix attn implementation composite

* fix moshi

* add message

* add typings

* use explicitly all flags for each attn type

* fix some tests

* import what is needed

* kosmos on main has ew attention already, yay

* new models in main, run fixup

* won't fix kosmos yet

* fix-copies

* clean up after rebasing

* fix tests

* style

* dont cast attns to fp32

* did we update ruff? oke, let's just do what it asks

* fix pixtral after rebase
This commit is contained in:
Raushan Turganbay
2025-05-08 18:18:54 +02:00
committed by GitHub
parent e296c63cd4
commit d23aae2b8c
47 changed files with 1318 additions and 1555 deletions

View File

@@ -219,9 +219,10 @@ class OPTModelTest(ModelTesterMixin, GenerationTesterMixin, PipelineTesterMixin,
else {}
)
is_encoder_decoder = False
fx_compatible = True
fx_compatible = False # Broken by attention refactor cc @Cyrilvallez
test_pruning = False
test_missing_keys = False
test_head_masking = False # new attn API doesn't support head mask
# TODO: Fix the failed tests
def is_pipeline_test_to_skip(