Yuxuan Zhang
39ba5f3cc2
GLM-4 Update (#39393)
* one commit with full
* Create glm4_moe.md
* Update check_config_docstrings.py
* Update __init__.py
* update
* argue
* argue: router problem
* 1
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update modular_glm4_moe.py
* update
* use dsv3 pretrainmodel in modular
* update for test
* upodate new modular
* use LlamaAttention and avoid use CohereAttention cause repeat norm
* update the modular
* update attn modular
* update
* Update modular_glm4_moe.py
* MTP layer is need to ignore
* fix gradient error using with dots_1 method
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
..
2025-06-17 19:37:18 +01:00
2025-06-25 17:29:10 +00:00
2025-07-21 13:24:34 +02:00
2025-06-17 19:37:18 +01:00
2025-06-13 11:07:09 +00:00
2024-11-04 09:40:30 -08:00
2025-07-08 10:20:52 +02:00
2025-06-25 17:29:10 +00:00
2025-07-09 09:29:51 -07:00
2025-06-13 11:07:09 +00:00
2025-06-17 19:37:18 +01:00
2024-12-17 09:32:00 -08:00
2023-11-08 08:35:20 -05:00
2025-06-17 19:37:18 +01:00
2024-04-08 14:21:16 +01:00