Glm 4 doc (#39247)
* update the glm4 model readme * update test * update GLM-4.1V model * update as format * update * fix some tests * fix the rest * fix on a10, not t4 * nit: dummy import --------- Co-authored-by: raushan <raushan@huggingface.co>
This commit is contained in:
@@ -18,7 +18,37 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
## Overview
|
||||
|
||||
To be released with the official model launch.
|
||||
The GLM family welcomes new members [GLM-4-0414](https://arxiv.org/pdf/2406.12793) series models.
|
||||
|
||||
The **GLM-4-32B-0414** series models, featuring 32 billion parameters. Its performance is comparable to OpenAI’s GPT
|
||||
series and DeepSeek’s V3/R1 series. It also supports very user-friendly local deployment features. GLM-4-32B-Base-0414
|
||||
was pre-trained on 15T of high-quality data, including substantial reasoning-type synthetic data. This lays the
|
||||
foundation for subsequent reinforcement learning extensions. In the post-training stage, we employed human preference
|
||||
alignment for dialogue scenarios. Additionally, using techniques like rejection sampling and reinforcement learning, we
|
||||
enhanced the model’s performance in instruction following, engineering code, and function calling, thus strengthening
|
||||
the atomic capabilities required for agent tasks. GLM-4-32B-0414 achieves good results in engineering code, Artifact
|
||||
generation, function calling, search-based Q&A, and report generation. In particular, on several benchmarks, such as
|
||||
code generation or specific Q&A tasks, GLM-4-32B-Base-0414 achieves comparable performance with those larger models like
|
||||
GPT-4o and DeepSeek-V3-0324 (671B).
|
||||
|
||||
**GLM-Z1-32B-0414** is a reasoning model with deep thinking capabilities. This was developed based on GLM-4-32B-0414
|
||||
through cold start, extended reinforcement learning, and further training on tasks including mathematics, code, and
|
||||
logic. Compared to the base model, GLM-Z1-32B-0414 significantly improves mathematical abilities and the capability to
|
||||
solve complex tasks. During training, we also introduced general reinforcement learning based on pairwise ranking
|
||||
feedback, which enhances the model's general capabilities.
|
||||
|
||||
**GLM-Z1-Rumination-32B-0414** is a deep reasoning model with rumination capabilities (against OpenAI's Deep Research).
|
||||
Unlike typical deep thinking models, the rumination model is capable of deeper and longer thinking to solve more
|
||||
open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future
|
||||
development plans). Z1-Rumination is trained through scaling end-to-end reinforcement learning with responses graded by
|
||||
the ground truth answers or rubrics and can make use of search tools during its deep thinking process to handle complex
|
||||
tasks. The model shows significant improvements in research-style writing and complex tasks.
|
||||
|
||||
Finally, **GLM-Z1-9B-0414** is a surprise. We employed all the aforementioned techniques to train a small model (9B).
|
||||
GLM-Z1-9B-0414 exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is
|
||||
top-ranked among all open-source models of the same size. Especially in resource-constrained scenarios, this model
|
||||
achieves an excellent balance between efficiency and effectiveness, providing a powerful option for users seeking
|
||||
lightweight deployment.
|
||||
|
||||
## Glm4Config
|
||||
|
||||
|
||||
@@ -23,6 +23,29 @@ rendered properly in your Markdown viewer.
|
||||
|
||||
# GLM-4.1V
|
||||
|
||||
## Overview
|
||||
|
||||
**GLM-4.1V-9B-Thinking** is a bilingual vision-language model optimized for reasoning, built on GLM-4-9B. It introduces
|
||||
a "thinking paradigm" with reinforcement learning, achieving state-of-the-art results among 10B-class models and
|
||||
rivaling 72B-scale models. It supports 64k context, 4K resolution, and arbitrary aspect ratios, with an open-source base
|
||||
model for further research. You can check our paper [here](https://huggingface.co/papers/2507.01006). and below is a abstract.
|
||||
|
||||
*We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding
|
||||
and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework.
|
||||
We first develop a capable vision foundation model with significant potential through large-scale pre-training, which
|
||||
arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum
|
||||
Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a
|
||||
diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding,
|
||||
GUI-based agents, and long document understanding. We open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art
|
||||
performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model
|
||||
outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks
|
||||
relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or
|
||||
superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document
|
||||
understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information
|
||||
are released at https://github.com/THUDM/GLM-4.1V-Thinking.*
|
||||
|
||||
## Usage
|
||||
|
||||
The example below demonstrates how to generate text based on an image with [`Pipeline`] or the [`AutoModel`] class.
|
||||
|
||||
<hfoptions id="usage">
|
||||
|
||||
Reference in New Issue
Block a user