Enhance documentation to explain chat-based few-shot prompting (#37828)

* Enhance documentation to explain chat-based few-shot prompting Updates the documentation on few-shot prompting to illustrate how to structure examples using the chat-based format for instruction-tuned models. * Update docs/source/en/tasks/prompting.md Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> * Update docs/source/en/tasks/prompting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/prompting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/prompting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/prompting.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * fix typos --------- Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
2025-04-30 20:00:10 +02:00
parent 36e2e33bbe
commit c3aeaa8060
1 changed files with 41 additions and 11 deletions
--- a/docs/source/en/tasks/prompting.md
+++ b/docs/source/en/tasks/prompting.md
@@ -78,32 +78,62 @@ Crafting a good prompt alone, also known as zero-shot prompting, may not be enou

 This section covers a few prompting techniques.

-### Few-shot
+### Few-shot prompting

-Few-shot prompting improves accuracy and performance by including specific examples of what a model should generate given an input. The explicit examples give the model a better understanding of the task and the output format you're looking for. Try experimenting with different numbers of examples (2, 4, 8, etc.) to see how it affects performance.
+Few-shot prompting improves accuracy and performance by including specific examples of what a model should generate given an input. The explicit examples give the model a better understanding of the task and the output format you’re looking for. Try experimenting with different numbers of examples (2, 4, 8, etc.) to see how it affects performance. The example below provides the model with 1 example (1-shot) of the output format (a date in MM/DD/YYYY format) it should return.

-The example below provides the model with 1 example (1-shot) of the output format (a date in MM/DD/YYYY format) it should return.
-
-```py
+```python
 from transformers import pipeline
 import torch

 pipeline = pipeline(model="mistralai/Mistral-7B-Instruct-v0.1", torch_dtype=torch.bfloat16, device_map="auto")
 prompt = """Text: The first human went into space and orbited the Earth on April 12, 1961.
 Date: 04/12/1961
-Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon. 
+Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon.
 Date:"""

 outputs = pipeline(prompt, max_new_tokens=12, do_sample=True, top_k=10)
 for output in outputs:
    print(f"Result: {output['generated_text']}")
-Result: Text: The first human went into space and orbited the Earth on April 12, 1961.
-Date: 04/12/1961
-Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon. 
-Date: 09/28/1960
+# Result: Text: The first human went into space and orbited the Earth on April 12, 1961.
+# Date: 04/12/1961
+# Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon.
+# Date: 09/28/1960
 ```

-The downside of few-shot prompting is that you need to create lengthier prompts which increases computation and latency. There is also a limit to prompt lengths. Finally, a model can learn unintended patterns from your examples and it doesn't work well on complex reasoning tasks.
+The downside of few-shot prompting is that you need to create lengthier prompts which increases computation and latency. There is also a limit to prompt lengths. Finally, a model can learn unintended patterns from your examples, and it may not work well on complex reasoning tasks.
+
+To improve few-shot prompting for modern instruction-tuned LLMs, use a model's specific [chat template](../conversations). These models are trained on datasets with turn-based conversations between a "user" and "assistant". Structuring your prompt to align with this can improve performance.
+
+Structure your prompt as a turn-based conversation and use the [`apply_chat_template`] method to tokenize and format it.
+
+```python
+from transformers import pipeline
+import torch
+
+pipeline = pipeline(model="mistralai/Mistral-7B-Instruct-v0.1", torch_dtype=torch.bfloat16, device_map="auto")
+
+messages = [
+    {"role": "user", "content": "Text: The first human went into space and orbited the Earth on April 12, 1961."},
+    {"role": "assistant", "content": "Date: 04/12/1961"},
+    {"role": "user", "content": "Text: The first-ever televised presidential debate in the United States took place on September 28, 1960, between presidential candidates John F. Kennedy and Richard Nixon."}
+]
+
+prompt = pipeline.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+
+outputs = pipeline(prompt, max_new_tokens=12, do_sample=True, top_k=10)
+
+for output in outputs:
+    print(f"Result: {output['generated_text']}")
+```
+
+
+While the basic few-shot prompting approach embedded examples within a single text string, the chat template format offers the following benefits.
+
+- The model may have a potentially improved understanding because it can better recognize the pattern and the expected roles of user input and assistant output.
+- The model may more consistently output the desired output format because it is structured like its input during training.
+
+Always consult a specific instruction-tuned model's documentation to learn more about the format of their chat template so that you can structure your few-shot prompts accordingly.

 ### Chain-of-thought