Add explicit example for RAG chat templating (#33503)

* Add explicit example for RAG chat templating * Add Tip box and reformulate Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
2024-09-18 00:08:05 +09:00
parent ac5a0556f1
commit 763548427d
1 changed files with 55 additions and 12 deletions
--- a/docs/source/en/chat_templating.md
+++ b/docs/source/en/chat_templating.md
@@ -616,22 +616,65 @@ than the JSON schemas used for tools, no helper functions are necessary.
 Here's an example of a RAG template in action:
 ```python
-document1 = {
+from transformers import AutoTokenizer, AutoModelForCausalLM
    "title": "The Moon: Our Age-Old Foe",
    "contents": "Man has always dreamed of destroying the moon. In this essay, I shall..."
 }
-document2 = {
+# Load the model and tokenizer
-    "title": "The Sun: Our Age-Old Friend",
+model_id = "CohereForAI/c4ai-command-r-v01-4bit"
-    "contents": "Although often underappreciated, the sun provides several notable benefits..."
+tokenizer = AutoTokenizer.from_pretrained(model_id)
-}
+model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
 device = model.device # Get the device the model is loaded on
-model_input = tokenizer.apply_chat_template(
+# Define conversation input
-    messages,
+conversation = [
-    documents=[document1, document2]
+    {"role": "user", "content": "What has Man always dreamed of?"}
-)
+]
 # Define documents for retrieval-based generation
 documents = [
    {
        "title": "The Moon: Our Age-Old Foe", 
        "text": "Man has always dreamed of destroying the moon. In this essay, I shall..."
    },
    {
        "title": "The Sun: Our Age-Old Friend",
        "text": "Although often underappreciated, the sun provides several notable benefits..."
    }
 ]
 # Tokenize conversation and documents using a RAG template, returning PyTorch tensors.
 input_ids = tokenizer.apply_chat_template(
    conversation=conversation,
    documents=documents,
    chat_template="rag",
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt").to(device)
 # Generate a response 
 gen_tokens = model.generate(
    input_ids,
    max_new_tokens=100,
    do_sample=True,
    temperature=0.3,
    )
 # Decode and print the generated text along with generation prompt
 gen_text = tokenizer.decode(gen_tokens[0])
 print(gen_text)
 ```
 <Tip>
 The `documents` input for retrieval-augmented generation is not widely supported, and many models have chat templates which simply ignore this input.
 To verify if a model supports the `documents` input, you can read its model card, or `print(tokenizer.chat_template)` to see if the `documents` key is used anywhere.
 One model class that does support it, though, is Cohere's [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-08-2024) and [Command-R+](https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024), through their `rag` chat template. You can see additional examples of grounded generation using this feature in their model cards.
 </Tip>
 ## Advanced: How do chat templates work?
 The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the