Cyril Vallez
4ded9a4113
🚨🚨 Fix and simplify attention implementation dispatch and subconfigs handling (#39423)
* first try
* Update modeling_utils.py
* Update modeling_utils.py
* big refactor
* Update modeling_utils.py
* style
* docstrings and simplify inner workings of configs
* remove all trace of _internal
* Update modeling_utils.py
* fix logic error
* Update modeling_utils.py
* recursive on config
* Update configuration_utils.py
* fix
* Update configuration_dpt.py
* Update configuration_utils.py
* Update configuration_utils.py
* Update modeling_idefics.py
* Update modeling_utils.py
* fix for old models
* more old models fixup
* Update modeling_utils.py
* Update configuration_utils.py
* Remove outdated test
* remove the deepcopy!! 🥵🥵
* Update test_modeling_gpt_bigcode.py
* fix qwen dispatch
* restrict to only models supporting it
* style
* switch name
* Update modeling_utils.py
* Update modeling_utils.py
* add tests!
* fix
* rypo
* remove bad copies
* fix
* Update modeling_utils.py
* additional check
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* Update modeling_utils.py
* fix
* skip