Yuxuan Zhang
39ba5f3cc2
GLM-4 Update (#39393)
* one commit with full
* Create glm4_moe.md
* Update check_config_docstrings.py
* Update __init__.py
* update
* argue
* argue: router problem
* 1
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update modular_glm4_moe.py
* update
* use dsv3 pretrainmodel in modular
* update for test
* upodate new modular
* use LlamaAttention and avoid use CohereAttention cause repeat norm
* update the modular
* update attn modular
* update
* Update modular_glm4_moe.py
* MTP layer is need to ignore
* fix gradient error using with dots_1 method
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
* Update test_modeling_glm4_moe.py
---------
Co-authored-by: Cyril Vallez <cyril.vallez@huggingface.co>
2025-07-21 13:24:34 +02:00
..
2025-07-16 12:45:46 +02:00
2025-06-13 11:07:09 +00:00
2025-07-21 13:24:34 +02:00
2025-07-07 13:12:02 +00:00
2025-05-23 16:39:47 +00:00
2025-06-17 19:37:18 +01:00
2024-11-28 16:04:05 +01:00
2025-07-21 13:24:34 +02:00
2025-06-13 15:32:40 +00:00
2025-05-30 16:05:07 +00:00
2025-06-26 12:25:14 -07:00
2025-03-03 10:33:46 -08:00
2025-07-03 17:04:16 +01:00
2025-07-18 13:41:54 +02:00
2025-07-18 18:00:34 +00:00
2025-03-03 10:33:46 -08:00
2025-07-16 14:00:17 +02:00
2025-03-07 13:09:02 +00:00
2025-06-30 07:56:55 -07:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-07-03 17:04:16 +01:00
2025-06-17 19:37:18 +01:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-03-03 10:33:46 -08:00
2025-03-03 10:33:46 -08:00
2025-06-30 07:56:55 -07:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-06-20 17:36:57 +01:00
2025-03-11 15:29:14 +01:00
2025-06-13 11:07:09 +00:00
2025-05-19 10:37:54 -07:00
2025-06-05 14:07:23 -07:00
2025-05-12 11:55:51 +02:00
2025-06-13 12:02:27 -07:00
2025-04-07 15:19:47 +02:00
2025-07-07 13:12:02 +00:00
2025-04-03 14:15:53 +01:00
2025-06-13 11:07:09 +00:00
2025-06-17 19:37:18 +01:00
2025-06-13 11:07:09 +00:00
2025-03-03 10:33:46 -08:00
2025-06-25 14:55:22 +00:00
2025-06-26 14:21:54 -07:00
2025-07-16 13:35:53 +02:00
2025-03-03 10:33:46 -08:00
2025-07-16 12:15:15 -07:00
2025-06-06 20:04:44 +02:00
2025-06-30 08:54:05 -07:00
2025-05-06 14:32:55 +01:00
2025-03-03 10:33:46 -08:00
2025-06-06 20:04:44 +02:00
2025-06-06 20:04:44 +02:00
2025-04-29 13:28:06 -07:00
2025-06-26 14:40:45 -07:00
2025-06-23 12:33:10 -07:00
2025-03-03 10:33:46 -08:00
2024-11-26 09:23:34 -08:00
2025-03-03 10:33:46 -08:00
2025-03-04 13:47:41 +00:00
2025-04-15 08:35:05 -07:00
2025-05-19 13:16:35 +00:00
2025-06-24 11:48:15 -07:00
2025-03-11 13:47:38 +00:00
2025-03-03 10:33:46 -08:00
2025-07-17 14:29:57 +00:00
2025-06-25 17:29:10 +00:00
2025-03-03 10:33:46 -08:00
2025-06-13 11:07:09 +00:00
2025-07-03 17:04:16 +01:00
2025-05-08 16:47:45 +01:00
2025-07-14 09:25:06 -07:00
2025-03-03 10:33:46 -08:00
2025-05-12 11:55:51 +02:00