Files
HuggingFace_transformer/tests
Yuxuan Zhang 39ba5f3cc2 GLM-4 Update (#39393)
* one commit with full

* Create glm4_moe.md

* Update check_config_docstrings.py

* Update __init__.py

* update

* argue

* argue: router problem

* 1

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update modular_glm4_moe.py

* update

* use dsv3 pretrainmodel in modular

* update for test

* upodate new modular

* use LlamaAttention and avoid use  CohereAttention cause repeat norm

* update the modular

* update attn modular

* update

* Update modular_glm4_moe.py

* MTP layer is need to ignore

* fix gradient error using with dots_1 method

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
..
2025-06-23 10:56:51 +02:00
2025-07-21 13:24:34 +02:00
2025-06-11 17:28:06 +01:00