[Ernie 4.5] Add ernie text models (#39228)
* init * copied from remote * add proper structure and llama like structure * fixup * revert to state that works * get closer to llama * slow and steady * some removal * masks work * it is indeed the rope implementation, how dafuq does it mesh with the cache now hmm * nice * getting closer * closer to transformers style * let's simplify this, batching works now * simplified * working version with modular * it is indeed the rotation per weights, make it complete llama style * cleanup conversion, next to look at -> tokenizer * remove llama artefacts * fix modeling tests (common ones) * style * integration test + first look into tokenization (will need more work, focussing on modeling other models first) * style * working moe version, based on remote * lets keep it simple and go step by step - transformers annotations for modular and transformers style rope (complex view) * more cleanup * refactor namings and remove addition forXXX classes * our moe won't cut it it seems, correction bias seems to be missing in remote code version * tokenization change (remote) * our moe version works when adding normalization :D * cleanup moe * nits * cleanup modeling -> let's get to modular next * style * modular v1 * minor things + attempt at conversion (which doesn't work) * no conversion follow glm, fixup modular and other nits * modular cleanup * fixes * tests, tests, tests + some moe dtype forcing * simplify modular, fix fatal fa2 bug, remaining tests * fix import issue? * some initial docs, fix bnb faulty behavior --> needs to fix some tests because of gate needing to be float * fix sdpa test, load on init dtype only * fixup post merge * style * fix doc links * tokenization cleanup beginnings * simplify tokenizer by a lot as its basically llama * tokenizer is full llama with different defaults + extra special tokens * sync og special tokens of ernie * fix decoding with numbers (also in remote done what a timing), begin of tok tests * align with remote and preserve special tokens, adjust tests to ernie legacy behavior, warning for questionable behavior (also in llama) * nits * docs * my daily post merge it is * check * tokenization update with explanations and conversion script * review on modular (til), revert some tokenizer things i did prior, remove mtp comment (low prio) * post merge fixes * fixup tokenization, llama fast is the way to go * more fixups * check * import fixes * correction bias following the paddle code * fix * fix TP plan, fix correction bias sharding during forward * style * whoops * fix tied weights * docs and last nit * license * flasky tests * move repo id, update when merged on the hub
This commit is contained in:
@@ -441,6 +441,10 @@
|
||||
title: Encoder Decoder Models
|
||||
- local: model_doc/ernie
|
||||
title: ERNIE
|
||||
- local: model_doc/ernie4_5
|
||||
title: Ernie4_5
|
||||
- local: model_doc/ernie4_5_moe
|
||||
title: Ernie4_5_MoE
|
||||
- local: model_doc/ernie_m
|
||||
title: ErnieM
|
||||
- local: model_doc/esm
|
||||
|
||||
99
docs/source/en/model_doc/ernie4_5.md
Normal file
99
docs/source/en/model_doc/ernie4_5.md
Normal file
@@ -0,0 +1,99 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
<div style="float: right;">
|
||||
<div class="flex flex-wrap space-x-1">
|
||||
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
|
||||
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
|
||||
</div>
|
||||
</div>
|
||||
|
||||
# Ernie 4.5
|
||||
|
||||
## Overview
|
||||
|
||||
The Ernie 4.5 model was released in the [Ernie 4.5 Model Family](https://ernie.baidu.com/blog/posts/ernie4.5/) release by baidu.
|
||||
This family of models contains multiple different architectures and model sizes. This model in specific targets the base text
|
||||
model without mixture of experts (moe) with 0.3B parameters in total. It uses the standard [Llama](./llama.md) at its core.
|
||||
|
||||
Other models from the family can be found at [Ernie 4.5 MoE](./ernie4_5_moe.md).
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://ernie.baidu.com/blog/posts/ernie4.5/overview.png"/>
|
||||
</div>
|
||||
|
||||
|
||||
## Usage Tips
|
||||
|
||||
### Generate text
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "baidu/ERNIE-4.5-0.3B-PT"
|
||||
|
||||
# load the tokenizer and the model
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
device_map="auto",
|
||||
torch_dtype=torch.bfloat16,
|
||||
)
|
||||
|
||||
# prepare the model input
|
||||
inputs = tokenizer("Hey, are you conscious? Can you talk to me?", return_tensors="pt")
|
||||
prompt = "Hey, are you conscious? Can you talk to me?"
|
||||
messages = [
|
||||
{"role": "user", "content": prompt}
|
||||
]
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)
|
||||
|
||||
# conduct text completion
|
||||
generated_ids = model.generate(
|
||||
**model_inputs,
|
||||
max_new_tokens=32,
|
||||
)
|
||||
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
||||
|
||||
# decode the generated ids
|
||||
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
|
||||
```
|
||||
|
||||
This model was contributed by [Anton Vlasjuk](https://huggingface.co/AntonV).
|
||||
The original code can be found [here](https://github.com/PaddlePaddle/ERNIE).
|
||||
|
||||
|
||||
## Ernie4_5Config
|
||||
|
||||
[[autodoc]] Ernie4_5Config
|
||||
|
||||
## Ernie4_5Model
|
||||
|
||||
[[autodoc]] Ernie4_5Model
|
||||
- forward
|
||||
|
||||
## Ernie4_5ForCausalLM
|
||||
|
||||
[[autodoc]] Ernie4_5ForCausalLM
|
||||
- forward
|
||||
183
docs/source/en/model_doc/ernie4_5_moe.md
Normal file
183
docs/source/en/model_doc/ernie4_5_moe.md
Normal file
@@ -0,0 +1,183 @@
|
||||
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
|
||||
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
||||
rendered properly in your Markdown viewer.
|
||||
|
||||
-->
|
||||
|
||||
<div style="float: right;">
|
||||
<div class="flex flex-wrap space-x-1">
|
||||
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||
<img alt="FlashAttention" src="https://img.shields.io/badge/%E2%9A%A1%EF%B8%8E%20FlashAttention-eae0c8?style=flat">
|
||||
<img alt="SDPA" src="https://img.shields.io/badge/SDPA-DE3412?style=flat&logo=pytorch&logoColor=white">
|
||||
<img alt="Tensor parallelism" src="https://img.shields.io/badge/Tensor%20parallelism-06b6d4?style=flat&logoColor=white">
|
||||
</div>
|
||||
</div>
|
||||
|
||||
# Ernie 4.5 MoE
|
||||
|
||||
## Overview
|
||||
|
||||
The Ernie 4.5 MoE model was released in the [Ernie 4.5 Model Family](https://ernie.baidu.com/blog/posts/ernie4.5/) release by baidu.
|
||||
This family of models contains multiple different architectures and model sizes. This model in specific targets the base text
|
||||
model with mixture of experts (moe) - one with 21B total, 3B active parameters and another one with 300B total, 47B active parameters.
|
||||
It uses the standard [Llama](./llama.md) at its core combined with a specialized MoE based on [Mixtral](./mixtral.md) with additional shared
|
||||
experts.
|
||||
|
||||
Other models from the family can be found at [Ernie 4.5](./ernie4_5.md).
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img src="https://ernie.baidu.com/blog/posts/ernie4.5/overview.png"/>
|
||||
</div>
|
||||
|
||||
|
||||
## Usage Tips
|
||||
|
||||
### Generate text
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "baidu/ERNIE-4.5-21B-A3B-PT"
|
||||
|
||||
# load the tokenizer and the model
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
device_map="auto",
|
||||
torch_dtype=torch.bfloat16,
|
||||
)
|
||||
|
||||
# prepare the model input
|
||||
inputs = tokenizer("Hey, are you conscious? Can you talk to me?", return_tensors="pt")
|
||||
prompt = "Hey, are you conscious? Can you talk to me?"
|
||||
messages = [
|
||||
{"role": "user", "content": prompt}
|
||||
]
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)
|
||||
|
||||
# conduct text completion
|
||||
generated_ids = model.generate(
|
||||
**model_inputs,
|
||||
max_new_tokens=32,
|
||||
)
|
||||
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
||||
|
||||
# decode the generated ids
|
||||
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
|
||||
```
|
||||
|
||||
### Distributed Generation with Tensor Parallelism
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "baidu/ERNIE-4.5-21B-A3B-PT"
|
||||
|
||||
# load the tokenizer and the model
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
device_map="auto",
|
||||
torch_dtype=torch.bfloat16,
|
||||
tp_plan="auto",
|
||||
)
|
||||
|
||||
# prepare the model input
|
||||
inputs = tokenizer("Hey, are you conscious? Can you talk to me?", return_tensors="pt")
|
||||
prompt = "Hey, are you conscious? Can you talk to me?"
|
||||
messages = [
|
||||
{"role": "user", "content": prompt}
|
||||
]
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)
|
||||
|
||||
# conduct text completion
|
||||
generated_ids = model.generate(
|
||||
**model_inputs,
|
||||
max_new_tokens=32,
|
||||
)
|
||||
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
||||
|
||||
# decode the generated ids
|
||||
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
|
||||
```
|
||||
|
||||
### Quantization with Bitsandbytes
|
||||
|
||||
```python
|
||||
import torch
|
||||
from transformers import BitsAndBytesConfig, AutoModelForCausalLM, AutoTokenizer
|
||||
|
||||
model_name = "baidu/ERNIE-4.5-21B-A3B-PT"
|
||||
|
||||
# load the tokenizer and the model
|
||||
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
model = AutoModelForCausalLM.from_pretrained(
|
||||
model_name,
|
||||
device_map="auto",
|
||||
quantization_config=BitsAndBytesConfig(load_in_4bit=True),
|
||||
)
|
||||
|
||||
# prepare the model input
|
||||
inputs = tokenizer("Hey, are you conscious? Can you talk to me?", return_tensors="pt")
|
||||
prompt = "Hey, are you conscious? Can you talk to me?"
|
||||
messages = [
|
||||
{"role": "user", "content": prompt}
|
||||
]
|
||||
text = tokenizer.apply_chat_template(
|
||||
messages,
|
||||
tokenize=False,
|
||||
add_generation_prompt=True
|
||||
)
|
||||
model_inputs = tokenizer([text], add_special_tokens=False, return_tensors="pt").to(model.device)
|
||||
|
||||
# conduct text completion
|
||||
generated_ids = model.generate(
|
||||
**model_inputs,
|
||||
max_new_tokens=32,
|
||||
)
|
||||
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
|
||||
|
||||
# decode the generated ids
|
||||
generate_text = tokenizer.decode(output_ids, skip_special_tokens=True)
|
||||
```
|
||||
|
||||
This model was contributed by [Anton Vlasjuk](https://huggingface.co/AntonV).
|
||||
The original code can be found [here](https://github.com/PaddlePaddle/ERNIE).
|
||||
|
||||
|
||||
## Ernie4_5_MoEConfig
|
||||
|
||||
[[autodoc]] Ernie4_5_MoEConfig
|
||||
|
||||
## Ernie4_5_MoEModel
|
||||
|
||||
[[autodoc]] Ernie4_5_MoEModel
|
||||
- forward
|
||||
|
||||
## Ernie4_5_MoEForCausalLM
|
||||
|
||||
[[autodoc]] Ernie4_5_MoEForCausalLM
|
||||
- forward
|
||||
- generate
|
||||
Reference in New Issue
Block a user