🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423)

* first try * Update modeling_utils.py * Update modeling_utils.py * big refactor * Update modeling_utils.py * style * docstrings and simplify inner workings of configs * remove all trace of _internal * Update modeling_utils.py * fix logic error * Update modeling_utils.py * recursive on config * Update configuration_utils.py * fix * Update configuration_dpt.py * Update configuration_utils.py * Update configuration_utils.py * Update modeling_idefics.py * Update modeling_utils.py * fix for old models * more old models fixup * Update modeling_utils.py * Update configuration_utils.py * Remove outdated test * remove the deepcopy!! 🥵🥵 * Update test_modeling_gpt_bigcode.py * fix qwen dispatch * restrict to only models supporting it * style * switch name * Update modeling_utils.py * Update modeling_utils.py * add tests! * fix * rypo * remove bad copies * fix * Update modeling_utils.py * additional check * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * Update modeling_utils.py * fix * skip
2025-07-18 13:41:54 +02:00
parent 2b819ba4e3
commit 4ded9a4113
33 changed files with 472 additions and 323 deletions
--- a/tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py
+++ b/tests/models/gpt_bigcode/test_modeling_gpt_bigcode.py
@@ -542,6 +542,8 @@ class GPTBigCodeMQATest(unittest.TestCase):
            attn_pdrop=0,
            resid_pdrop=0,
        )
+        # We need to set it here as it's normally set by the Model's __init__
+        config._attn_implementation = "sdpa"
        return GPTBigCodeAttention(config)

    @parameterized.expand([(seed, is_train_mode) for seed in range(5) for is_train_mode in [True, False]])