Files
HuggingFace_transformer/tests/models
Yuxuan Zhang 39ba5f3cc2 GLM-4 Update (#39393)
* one commit with full

* Create glm4_moe.md

* Update check_config_docstrings.py

* Update __init__.py

* update

* argue

* argue: router problem

* 1

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update modular_glm4_moe.py

* update

* use dsv3 pretrainmodel in modular

* update for test

* upodate new modular

* use LlamaAttention and avoid use  CohereAttention cause repeat norm

* update the modular

* update attn modular

* update

* Update modular_glm4_moe.py

* MTP layer is need to ignore

* fix gradient error using with dots_1 method

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

* Update test_modeling_glm4_moe.py

---------

Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
..
2025-06-11 17:28:06 +01:00
2025-06-24 15:05:29 +02:00
2025-06-11 17:28:06 +01:00
2025-05-16 13:26:54 +02:00
2025-04-28 15:08:46 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-08 11:44:29 +02:00
2025-06-27 16:54:11 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-02 12:25:26 +01:00
2025-06-11 17:28:06 +01:00
2025-05-23 18:29:31 +01:00
2025-06-11 17:28:06 +01:00
2025-06-19 10:56:34 +02:00
2025-06-11 17:28:06 +01:00
2025-07-16 15:53:43 +02:00
2025-07-21 13:24:34 +02:00
2025-06-11 17:28:06 +01:00
2025-05-23 18:29:31 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-05-23 18:29:31 +01:00
2025-06-26 20:07:17 +02:00
2025-06-11 17:28:06 +01:00
2025-07-14 12:02:59 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-24 18:01:15 +02:00
2025-04-15 11:33:09 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-04-08 17:15:37 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-10 05:18:44 +00:00
2025-07-10 05:18:44 +00:00
2025-06-11 17:28:06 +01:00
2025-05-23 18:29:31 +01:00
2025-05-23 18:29:31 +01:00
2025-05-28 16:44:20 +01:00
2025-05-28 16:44:20 +01:00
2025-06-18 14:36:03 +02:00
2025-06-23 17:42:46 +02:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-18 17:27:16 +00:00
2025-06-25 15:12:15 +00:00
2025-05-23 18:29:31 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-04-08 17:15:37 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-07-18 00:02:04 +00:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00
2025-06-11 17:28:06 +01:00