🔴 VLM: compile compatibility (#35724)

* llavas

* add mroe models

* fix `compile_forward` test for all models

* fix copies

* make style

* also doesn't support cache class

* fix some tests

* not copied from

* ci green?

* fix tests

* fix copies

* fix tests

* check with `numel` and remove `item`

* fix copies

* fix copies

* Update src/transformers/models/cohere2/modeling_cohere2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* opt remove cross attn

* gemma2

* fixup

* fixup

* fix newly added test

* maybe fixed?

* green please?

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
Raushan Turganbay
2025-02-14 15:23:49 +01:00
committed by GitHub
parent b45cf0e90a
commit 0c78ef6cd3
44 changed files with 464 additions and 1215 deletions

View File

@@ -226,14 +226,6 @@ class VideoLlavaForConditionalGenerationModelTest(ModelTesterMixin, GenerationTe
def test_training_gradient_checkpointing_use_reentrant_false(self):
pass
@unittest.skip(reason="Pass because video-LLava requires `attention_mask is not None`")
def test_sdpa_can_compile_dynamic(self):
pass
@unittest.skip(reason="Pass because video-LLava requires `attention_mask is not None`")
def test_sdpa_can_dispatch_on_flash(self):
pass
@unittest.skip("FlashAttention only support fp16 and bf16 data type")
def test_flash_attn_2_fp32_ln(self):
pass