Update chat template docs to remove Blenderbot (#33254)
* Update docs to remove obsolete Blenderbot * Remove another reference to Blenderbot
This commit is contained in:
@@ -26,26 +26,7 @@ Much like tokenization, different models expect very different input formats for
|
|||||||
**chat templates** as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations,
|
**chat templates** as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations,
|
||||||
represented as lists of messages, into a single tokenizable string in the format that the model expects.
|
represented as lists of messages, into a single tokenizable string in the format that the model expects.
|
||||||
|
|
||||||
Let's make this concrete with a quick example using the `BlenderBot` model. BlenderBot has an extremely simple default
|
Let's make this concrete with a quick example using the `mistralai/Mistral-7B-Instruct-v0.1` model:
|
||||||
template, which mostly just adds whitespace between rounds of dialogue:
|
|
||||||
|
|
||||||
```python
|
|
||||||
>>> from transformers import AutoTokenizer
|
|
||||||
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
|
|
||||||
|
|
||||||
>>> chat = [
|
|
||||||
... {"role": "user", "content": "Hello, how are you?"},
|
|
||||||
... {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
|
|
||||||
... {"role": "user", "content": "I'd like to show off how chat templating works!"},
|
|
||||||
... ]
|
|
||||||
|
|
||||||
>>> tokenizer.apply_chat_template(chat, tokenize=False)
|
|
||||||
" Hello, how are you? I'm doing great. How can I help you today? I'd like to show off how chat templating works!</s>"
|
|
||||||
```
|
|
||||||
|
|
||||||
Notice how the entire chat is condensed into a single string. If we use `tokenize=True`, which is the default setting,
|
|
||||||
that string will also be tokenized for us. To see a more complex template in action, though, let's use the
|
|
||||||
`mistralai/Mistral-7B-Instruct-v0.1` model.
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import AutoTokenizer
|
>>> from transformers import AutoTokenizer
|
||||||
@@ -61,8 +42,26 @@ that string will also be tokenized for us. To see a more complex template in act
|
|||||||
"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
|
"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
|
||||||
```
|
```
|
||||||
|
|
||||||
Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of
|
Notice how the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of
|
||||||
user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not.
|
user messages (but not assistant messages!), and the entire chat is condensed into a single string.
|
||||||
|
If we use `tokenize=True`, which is the default setting, that string will also be tokenized for us.
|
||||||
|
|
||||||
|
Now, try the same code, but swap in the `HuggingFaceH4/zephyr-7b-beta` model instead, and you should get:
|
||||||
|
|
||||||
|
```text
|
||||||
|
<|user|>
|
||||||
|
Hello, how are you?</s>
|
||||||
|
<|assistant|>
|
||||||
|
I'm doing great. How can I help you today?</s>
|
||||||
|
<|user|>
|
||||||
|
I'd like to show off how chat templating works!</s>
|
||||||
|
```
|
||||||
|
|
||||||
|
Both Zephyr and Mistral-Instruct were fine-tuned from the same base model, `Mistral-7B-v0.1`. However, they were trained
|
||||||
|
with totally different chat formats. Without chat templates, you would have to write manual formatting code for each
|
||||||
|
model, and it's very easy to make minor errors that hurt performance! Chat templates handle the details of formatting
|
||||||
|
for you, allowing you to write universal code that works for any model.
|
||||||
|
|
||||||
|
|
||||||
## How do I use chat templates?
|
## How do I use chat templates?
|
||||||
|
|
||||||
@@ -71,7 +70,7 @@ and `content` keys, and then pass it to the [`~PreTrainedTokenizer.apply_chat_te
|
|||||||
you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
|
you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
|
||||||
to use `add_generation_prompt=True` to add a [generation prompt](#what-are-generation-prompts).
|
to use `add_generation_prompt=True` to add a [generation prompt](#what-are-generation-prompts).
|
||||||
|
|
||||||
Here's an example of preparing input for `model.generate()`, using the `Zephyr` assistant model:
|
Here's an example of preparing input for `model.generate()`, using `Zephyr` again:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
from transformers import AutoModelForCausalLM, AutoTokenizer
|
from transformers import AutoModelForCausalLM, AutoTokenizer
|
||||||
@@ -160,7 +159,7 @@ messages = [
|
|||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
Here's what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example:
|
Here's what this will look like without a generation prompt, for a model that uses standard "ChatML" formatting:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
|
tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
|
||||||
@@ -193,7 +192,7 @@ message. Remember, chat models are still just language models - they're trained
|
|||||||
special kind of text to them! You need to guide them with appropriate control tokens, so they know what they're
|
special kind of text to them! You need to guide them with appropriate control tokens, so they know what they're
|
||||||
supposed to be doing.
|
supposed to be doing.
|
||||||
|
|
||||||
Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any
|
Not all models require generation prompts. Some models, like LLaMA, don't have any
|
||||||
special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact
|
special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact
|
||||||
effect that `add_generation_prompt` has will depend on the template being used.
|
effect that `add_generation_prompt` has will depend on the template being used.
|
||||||
|
|
||||||
@@ -630,32 +629,17 @@ model_input = tokenizer.apply_chat_template(
|
|||||||
## Advanced: How do chat templates work?
|
## Advanced: How do chat templates work?
|
||||||
|
|
||||||
The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the
|
The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the
|
||||||
default template for that model class is used instead. Let's take a look at the template for `BlenderBot`:
|
default template for that model class is used instead. Let's take a look at a `Zephyr` chat template, though note this
|
||||||
|
one is a little simplified from the actual one!
|
||||||
```python
|
|
||||||
|
|
||||||
>>> from transformers import AutoTokenizer
|
|
||||||
>>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
|
|
||||||
|
|
||||||
>>> tokenizer.chat_template
|
|
||||||
"{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ ' ' }}{% endif %}{% endfor %}{{ eos_token }}"
|
|
||||||
```
|
|
||||||
|
|
||||||
That's kind of intimidating. Let's clean it up a little to make it more readable. In the process, though, we also make
|
|
||||||
sure that the newlines and indentation we add don't end up being included in the template output - see the tip on
|
|
||||||
[trimming whitespace](#trimming-whitespace) below!
|
|
||||||
|
|
||||||
```
|
```
|
||||||
{%- for message in messages %}
|
{%- for message in messages %}
|
||||||
{%- if message['role'] == 'user' %}
|
{{- '<|' + message['role'] + |>\n' }}
|
||||||
{{- ' ' }}
|
{{- message['content'] + eos_token }}
|
||||||
{%- endif %}
|
|
||||||
{{- message['content'] }}
|
|
||||||
{%- if not loop.last %}
|
|
||||||
{{- ' ' }}
|
|
||||||
{%- endif %}
|
|
||||||
{%- endfor %}
|
{%- endfor %}
|
||||||
{{- eos_token }}
|
{%- if add_generation_prompt %}
|
||||||
|
{{- '<|assistant|>\n' }}
|
||||||
|
{%- endif %}
|
||||||
```
|
```
|
||||||
|
|
||||||
If you've never seen one of these before, this is a [Jinja template](https://jinja.palletsprojects.com/en/3.1.x/templates/).
|
If you've never seen one of these before, this is a [Jinja template](https://jinja.palletsprojects.com/en/3.1.x/templates/).
|
||||||
@@ -663,25 +647,23 @@ Jinja is a templating language that allows you to write simple code that generat
|
|||||||
syntax resembles Python. In pure Python, this template would look something like this:
|
syntax resembles Python. In pure Python, this template would look something like this:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
for idx, message in enumerate(messages):
|
for message in messages:
|
||||||
if message['role'] == 'user':
|
print(f'<|{message["role"]}|>')
|
||||||
print(' ')
|
print(message['content'] + eos_token)
|
||||||
print(message['content'])
|
if add_generation_prompt:
|
||||||
if not idx == len(messages) - 1: # Check for the last message in the conversation
|
print('<|assistant|>')
|
||||||
print(' ')
|
|
||||||
print(eos_token)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Effectively, the template does three things:
|
Effectively, the template does three things:
|
||||||
1. For each message, if the message is a user message, add a blank space before it, otherwise print nothing.
|
1. For each message, print the role enclosed in `<|` and `|>`, like `<|user|>` or `<|assistant|>`.
|
||||||
2. Add the message content
|
2. Next, print the content of the message, followed by the end-of-sequence token.
|
||||||
3. If the message is not the last message, add two spaces after it. After the final message, print the EOS token.
|
3. Finally, if `add_generation_prompt` is set, print the assistant token, so that the model knows to start generating
|
||||||
|
an assistant response.
|
||||||
|
|
||||||
This is a pretty simple template - it doesn't add any control tokens, and it doesn't support "system" messages, which
|
This is a pretty simple template but Jinja gives you a lot of flexibility to do more complex things! Let's see a Jinja
|
||||||
are a common way to give the model directives about how it should behave in the subsequent conversation.
|
template that can format inputs similarly to the way LLaMA formats them (note that the real LLaMA template includes
|
||||||
But Jinja gives you a lot of flexibility to do those things! Let's see a Jinja template that can format inputs
|
handling for default system messages and slightly different system message handling in general - don't use this one
|
||||||
similarly to the way LLaMA formats them (note that the real LLaMA template includes handling for default system
|
in your actual code!)
|
||||||
messages and slightly different system message handling in general - don't use this one in your actual code!)
|
|
||||||
|
|
||||||
```
|
```
|
||||||
{%- for message in messages %}
|
{%- for message in messages %}
|
||||||
@@ -695,8 +677,8 @@ messages and slightly different system message handling in general - don't use t
|
|||||||
{%- endfor %}
|
{%- endfor %}
|
||||||
```
|
```
|
||||||
|
|
||||||
Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based
|
Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens like
|
||||||
on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly
|
`[INST]` and `[/INST]` based on the role of each message. User, assistant and system messages are clearly
|
||||||
distinguishable to the model because of the tokens they're wrapped in.
|
distinguishable to the model because of the tokens they're wrapped in.
|
||||||
|
|
||||||
## Advanced: Adding and editing chat templates
|
## Advanced: Adding and editing chat templates
|
||||||
|
|||||||
Reference in New Issue
Block a user