Update model card and link of blog post. (#29928)

* Update qwen2_moe.md

* update link of blogpost.

* fixup

---------

Co-authored-by: bozheng-hit <dsoul0621@gmail.com>
This commit is contained in:
Bo Zheng
2024-03-31 00:49:03 +08:00
committed by GitHub
parent f6701bc664
commit 46d636818b
14 changed files with 15 additions and 15 deletions

View File

@@ -25,9 +25,9 @@ Qwen2MoE is the new model series of large language models from the Qwen team. Pr
Qwen2MoE is a language model series including decoder language models of different model sizes. For each size, we release the base language model and the aligned chat model. Qwen2MoE has the following architectural choices:
- Qwen2MoE is based on the Transformer architecture with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, etc. Additionally, we have an improved tokenizer adaptive to multiple natural languages and codes.
- Qwen2MoE employs Mixture of Experts (MoE) architecture, where the models are upcycled from dense language models. For instance, `Qwen1.5-MoE-A2.7B` is upcycled from `Qwen-1.8B`. It has 14.3B parameters in total and 2.7B activated parameters during runtime, while it achieves comparable performance with `Qwen1.5-7B`, with only 20% of the training resources.
- Qwen2MoE employs Mixture of Experts (MoE) architecture, where the models are upcycled from dense language models. For instance, `Qwen1.5-MoE-A2.7B` is upcycled from `Qwen-1.8B`. It has 14.3B parameters in total and 2.7B activated parameters during runtime, while it achieves comparable performance with `Qwen1.5-7B`, with only 25% of the training resources.
For more details refer to the [release blog post](https://qwenlm.github.io/blog/qwen1.5/).
For more details refer to the [release blog post](https://qwenlm.github.io/blog/qwen-moe/).
## Usage tips