Adding Qwen3 and Qwen3MoE (#36878)

* Initial commit for Qwen3

* fix and add tests for qwen3 & qwen3_moe

* rename models for tests.

* fix

* fix

* fix and add docs.

* fix model name in docs.

* simplify modular and fix configuration issues

* Fix the red CI: ruff was updated

* revert ruff, version was wrong

* fix qwen3moe.

* fix

* make sure MOE can load

* fix copies

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
This commit is contained in:
Bo Zheng
2025-03-31 15:50:49 +08:00
committed by GitHub
parent 0d6a60fe55
commit 6acd5aecb3
26 changed files with 5650 additions and 3 deletions

View File

@@ -603,6 +603,10 @@
title: Qwen2
- local: model_doc/qwen2_moe
title: Qwen2MoE
- local: model_doc/qwen3
title: Qwen3
- local: model_doc/qwen3_moe
title: Qwen3MoE
- local: model_doc/rag
title: RAG
- local: model_doc/realm

View File

@@ -43,4 +43,3 @@ Transformers is designed for developers and machine learning engineers and resea
</a>
</div>
Join us on the Hugging Face [Hub](https://huggingface.co/), [Discord](https://discord.com/invite/JfAtkvEtRb), or [forum](https://discuss.huggingface.co/) to collaborate and build models, datasets, and applications together.

View File

@@ -0,0 +1,59 @@
<!--Copyright 2024 The Qwen Team and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Qwen3
## Overview
To be released with the official model launch.
### Model Details
To be released with the official model launch.
## Usage tips
To be released with the official model launch.
## Qwen3Config
[[autodoc]] Qwen3Config
## Qwen3Model
[[autodoc]] Qwen3Model
- forward
## Qwen3ForCausalLM
[[autodoc]] Qwen3ForCausalLM
- forward
## Qwen3ForSequenceClassification
[[autodoc]] Qwen3ForSequenceClassification
- forward
## Qwen3ForTokenClassification
[[autodoc]] Qwen3ForTokenClassification
- forward
## Qwen3ForQuestionAnswering
[[autodoc]] Qwen3ForQuestionAnswering
- forward

View File

@@ -0,0 +1,58 @@
<!--Copyright 2024 The Qwen Team and The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# Qwen3MoE
## Overview
To be released with the official model launch.
### Model Details
To be released with the official model launch.
## Usage tips
To be released with the official model launch.
## Qwen3MoeConfig
[[autodoc]] Qwen3MoeConfig
## Qwen3MoeModel
[[autodoc]] Qwen3MoeModel
- forward
## Qwen3MoeForCausalLM
[[autodoc]] Qwen3MoeForCausalLM
- forward
## Qwen3MoeForSequenceClassification
[[autodoc]] Qwen3MoeForSequenceClassification
- forward
## Qwen3MoeForTokenClassification
[[autodoc]] Qwen3MoeForTokenClassification
- forward
## Qwen3MoeForQuestionAnswering
[[autodoc]] Qwen3MoeForQuestionAnswering
- forward