Update chat template docs to remove Blenderbot (#33254)

* Update docs to remove obsolete Blenderbot * Remove another reference to Blenderbot
2024-09-03 12:18:04 +01:00
parent edeca4387c
commit 0d86727354
1 changed files with 46 additions and 64 deletions
--- a/docs/source/en/chat_templating.md
+++ b/docs/source/en/chat_templating.md
@@ -26,26 +26,7 @@ Much like tokenization, different models expect very different input formats for
 **chat templates** as a feature. Chat templates are part of the tokenizer. They specify how to convert conversations, 
 represented as lists of messages, into a single tokenizable string in the format that the model expects. 
-Let's make this concrete with a quick example using the `BlenderBot` model. BlenderBot has an extremely simple default 
+Let's make this concrete with a quick example using the `mistralai/Mistral-7B-Instruct-v0.1` model:
 template, which mostly just adds whitespace between rounds of dialogue:
 ```python
 >>> from transformers import AutoTokenizer
 >>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
 >>> chat = [
 ...    {"role": "user", "content": "Hello, how are you?"},
 ...    {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
 ...    {"role": "user", "content": "I'd like to show off how chat templating works!"},
 ... ]
 >>> tokenizer.apply_chat_template(chat, tokenize=False)
 " Hello, how are you?  I'm doing great. How can I help you today?   I'd like to show off how chat templating works!</s>"
 ```
 Notice how the entire chat is condensed into a single string. If we use `tokenize=True`, which is the default setting,
 that string will also be tokenized for us. To see a more complex template in action, though, let's use the 
 `mistralai/Mistral-7B-Instruct-v0.1` model.
 ```python
 >>> from transformers import AutoTokenizer
@@ -61,8 +42,26 @@ that string will also be tokenized for us. To see a more complex template in act
 "<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]"
 ```
-Note that this time, the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of 
+Notice how the tokenizer has added the control tokens [INST] and [/INST] to indicate the start and end of 
-user messages (but not assistant messages!). Mistral-instruct was trained with these tokens, but BlenderBot was not.
+user messages (but not assistant messages!), and the entire chat is condensed into a single string. 
 If we use `tokenize=True`, which is the default setting, that string will also be tokenized for us.
 Now, try the same code, but swap in the `HuggingFaceH4/zephyr-7b-beta` model instead, and you should get:
 ```text
 <|user|>
 Hello, how are you?</s>
 <|assistant|>
 I'm doing great. How can I help you today?</s>
 <|user|>
 I'd like to show off how chat templating works!</s>
 ```
 Both Zephyr and Mistral-Instruct were fine-tuned from the same base model, `Mistral-7B-v0.1`. However, they were trained
 with totally different chat formats. Without chat templates, you would have to write manual formatting code for each
 model, and it's very easy to make minor errors that hurt performance! Chat templates handle the details of formatting 
 for you, allowing you to write universal code that works for any model.
 ## How do I use chat templates?
@@ -71,7 +70,7 @@ and `content` keys, and then pass it to the [`~PreTrainedTokenizer.apply_chat_te
 you'll get output that's ready to go! When using chat templates as input for model generation, it's also a good idea
 to use `add_generation_prompt=True` to add a [generation prompt](#what-are-generation-prompts). 
-Here's an example of preparing input for `model.generate()`, using the `Zephyr` assistant model:
+Here's an example of preparing input for `model.generate()`, using `Zephyr` again:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -160,7 +159,7 @@ messages = [
 ]
 ```
-Here's what this will look like without a generation prompt, using the ChatML template we saw in the Zephyr example:
+Here's what this will look like without a generation prompt, for a model that uses standard "ChatML" formatting:
 ```python
 tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False)
@@ -193,7 +192,7 @@ message. Remember, chat models are still just language models - they're trained
 special kind of text to them! You need to guide them with appropriate control tokens, so they know what they're 
 supposed to be doing.
-Not all models require generation prompts. Some models, like BlenderBot and LLaMA, don't have any
+Not all models require generation prompts. Some models, like LLaMA, don't have any
 special tokens before bot responses. In these cases, the `add_generation_prompt` argument will have no effect. The exact
 effect that `add_generation_prompt` has will depend on the template being used.
@@ -630,32 +629,17 @@ model_input = tokenizer.apply_chat_template(
 ## Advanced: How do chat templates work?
 The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the
-default template for that model class is used instead. Let's take a look at the template for `BlenderBot`:
+default template for that model class is used instead. Let's take a look at a `Zephyr` chat template, though note this
-
+one is a little simplified from the actual one!
 ```python
 >>> from transformers import AutoTokenizer
 >>> tokenizer = AutoTokenizer.from_pretrained("facebook/blenderbot-400M-distill")
 >>> tokenizer.chat_template
 "{% for message in messages %}{% if message['role'] == 'user' %}{{ ' ' }}{% endif %}{{ message['content'] }}{% if not loop.last %}{{ '  ' }}{% endif %}{% endfor %}{{ eos_token }}"
 ```
 That's kind of intimidating. Let's clean it up a little to make it more readable. In the process, though, we also make
 sure that the newlines and indentation we add don't end up being included in the template output - see the tip on
 [trimming whitespace](#trimming-whitespace) below!
 ```
 {%- for message in messages %}
-    {%- if message['role'] == 'user' %}
+    {{- '<|' + message['role'] + |>\n' }}
-        {{- ' ' }}
+    {{- message['content'] + eos_token }}
    {%- endif %}
    {{- message['content'] }}
    {%- if not loop.last %}
        {{- '  ' }}
    {%- endif %}
 {%- endfor %}
-{{- eos_token }}
+{%- if add_generation_prompt %}
    {{- '<|assistant|>\n' }}
 {%- endif %}
 ```
 If you've never seen one of these before, this is a [Jinja template](https://jinja.palletsprojects.com/en/3.1.x/templates/).
@@ -663,25 +647,23 @@ Jinja is a templating language that allows you to write simple code that generat
 syntax resembles Python. In pure Python, this template would look something like this:
 ```python
-for idx, message in enumerate(messages):
+for message in messages:
-    if message['role'] == 'user':
+    print(f'<|{message["role"]}|>')
-        print(' ')
+    print(message['content'] + eos_token)
-    print(message['content'])
+if add_generation_prompt:
-    if not idx == len(messages) - 1:  # Check for the last message in the conversation
+    print('<|assistant|>')
        print('  ')
 print(eos_token)
 ```
 Effectively, the template does three things:
-1. For each message, if the message is a user message, add a blank space before it, otherwise print nothing.
+1. For each message, print the role enclosed in `<|` and `|>`, like `<|user|>` or `<|assistant|>`.
-2. Add the message content
+2. Next, print the content of the message, followed by the end-of-sequence token.
-3. If the message is not the last message, add two spaces after it. After the final message, print the EOS token.
+3. Finally, if `add_generation_prompt` is set, print the assistant token, so that the model knows to start generating
   an assistant response.
-This is a pretty simple template - it doesn't add any control tokens, and it doesn't support "system" messages, which 
+This is a pretty simple template but Jinja gives you a lot of flexibility to do more complex things! Let's see a Jinja
-are a common way to give the model directives about how it should behave in the subsequent conversation.
+template that can format inputs similarly to the way LLaMA formats them (note that the real LLaMA template includes 
-But Jinja gives you a lot of flexibility to do those things! Let's see a Jinja template that can format inputs
+handling for default system messages and slightly different system message handling in general - don't use this one 
-similarly to the way LLaMA formats them (note that the real LLaMA template includes handling for default system
+in your actual code!)
 messages and slightly different system message handling in general - don't use this one in your actual code!)
 ```
 {%- for message in messages %}
@@ -695,8 +677,8 @@ messages and slightly different system message handling in general - don't use t
 {%- endfor %}
 ```
-Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens based
+Hopefully if you stare at this for a little bit you can see what this template is doing - it adds specific tokens like
-on the "role" of each message, which represents who sent it. User, assistant and system messages are clearly
+`[INST]` and `[/INST]` based on the role of each message. User, assistant and system messages are clearly
 distinguishable to the model because of the tokens they're wrapped in.
 ## Advanced: Adding and editing chat templates