Files
HuggingFace_transformer/docs/source/en
Yuxuan Zhang 39ba5f3cc2 GLM-4 Update (#39393)
* one commit with full

* Create glm4_moe.md

* Update check_config_docstrings.py

* Update __init__.py

* update

* argue

* argue: router problem

* 1

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update modular_glm4_moe.py

* update

* use dsv3 pretrainmodel in modular

* update for test

* upodate new modular

* use LlamaAttention and avoid use  CohereAttention cause repeat norm

* update the modular

* update attn modular

* update

* Update modular_glm4_moe.py

* MTP layer is need to ignore

* fix gradient error using with dots_1 method

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
..
2025-07-16 12:45:46 +02:00
2025-06-13 11:07:09 +00:00
2025-07-21 13:24:34 +02:00
2025-06-17 19:37:18 +01:00
2025-07-21 13:24:34 +02:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-30 07:56:55 -07:00
2025-03-03 10:33:46 -08:00
2025-03-11 15:29:14 +01:00
2025-06-13 11:07:09 +00:00
2025-04-07 15:19:47 +02:00
2025-04-03 14:15:53 +01:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-05-19 13:16:35 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00