Bart: new cache format (#35314)

* bart compile

* add mbart

* some more models touched by fix-copies

* more

* more models

* even more models

* fix copies

* fix tests

* fix copies

* fix

* biogpt accepts position ids now (breaking?)

* fix failing non-slow tests

* fix some tests

* should not be removed

* small update

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* update for last `main`

* fix copies

* clone `update_causal_mask` from llama

* tmp

* fixup

* why? how?

* fix bart tests

* dont skip test

* address comments

* fix tests

* fix

* fixup and delete the file

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
This commit is contained in:
Raushan Turganbay
2025-05-16 13:26:54 +02:00
committed by GitHub
parent 3ab47b6ce3
commit 01ad9f4b49
46 changed files with 3904 additions and 1995 deletions

View File

@@ -82,7 +82,7 @@ class M2M100ModelTester:
attention_probs_dropout_prob=0.1,
encoder_layerdrop=0.0,
decoder_layerdrop=0.0,
max_position_embeddings=20,
max_position_embeddings=50,
eos_token_id=2,
pad_token_id=1,
bos_token_id=0,
@@ -426,7 +426,7 @@ class M2M100ModelIntegrationTests(unittest.TestCase):
Overwriting the common test as the test is flaky on tiny models
"""
model = M2M100ForConditionalGeneration.from_pretrained(
"facebook/m2m100_418M", attn_implementation="flash_attention_2"
"facebook/m2m100_418M", torch_dtype=torch.float16, attn_implementation="flash_attention_2"
).to(torch_device)
tokenizer = M2M100Tokenizer.from_pretrained("facebook/m2m100_418M", src_lang="fr", tgt_lang="en")