From 3a8ec8c467bc7416f957f004f76cdc6e0c62fdf3 Mon Sep 17 00:00:00 2001
From: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Date: Wed, 26 Mar 2025 10:11:34 -0700
Subject: [PATCH] [docs] Attention mask image (#36970)
add image
---
docs/source/en/model_doc/gemma3.md | 4 ++++
docs/source/en/model_doc/llama.md | 4 ++++
docs/source/en/model_doc/llama2.md | 4 ++++
docs/source/en/model_doc/paligemma.md | 4 ++++
4 files changed, 16 insertions(+)
diff --git a/docs/source/en/model_doc/gemma3.md b/docs/source/en/model_doc/gemma3.md
index 4c7d978d3b..72c0c5d76a 100644
--- a/docs/source/en/model_doc/gemma3.md
+++ b/docs/source/en/model_doc/gemma3.md
@@ -164,6 +164,10 @@ visualizer = AttentionMaskVisualizer("google/gemma-3-4b-it")
visualizer("
What is shown in this image?")
```
+
+

+
+
## Notes
- Use [`Gemma3ForConditionalGeneration`] for image-and-text and image-only inputs.
diff --git a/docs/source/en/model_doc/llama.md b/docs/source/en/model_doc/llama.md
index 8869d8d4e8..7dc0660896 100644
--- a/docs/source/en/model_doc/llama.md
+++ b/docs/source/en/model_doc/llama.md
@@ -116,6 +116,10 @@ visualizer = AttentionMaskVisualizer("huggyllama/llama-7b")
visualizer("Plants create energy through a process known as")
```
+
+

+
+
## Notes
- The tokenizer is a byte-pair encoding model based on [SentencePiece](https://github.com/google/sentencepiece). During decoding, if the first token is the start of the word (for example, "Banana"), the tokenizer doesn't prepend the prefix space to the string.
diff --git a/docs/source/en/model_doc/llama2.md b/docs/source/en/model_doc/llama2.md
index 4df0375f99..ec981890b2 100644
--- a/docs/source/en/model_doc/llama2.md
+++ b/docs/source/en/model_doc/llama2.md
@@ -116,6 +116,10 @@ visualizer = AttentionMaskVisualizer("meta-llama/Llama-2-7b-hf")
visualizer("Plants create energy through a process known as")
```
+
+

+
+
## Notes
- Setting `config.pretraining_tp` to a value besides `1` activates a more accurate but slower computation of the linear layers. This matches the original logits better.
diff --git a/docs/source/en/model_doc/paligemma.md b/docs/source/en/model_doc/paligemma.md
index a1b4b6e1d4..fa119a5f83 100644
--- a/docs/source/en/model_doc/paligemma.md
+++ b/docs/source/en/model_doc/paligemma.md
@@ -125,6 +125,10 @@ visualizer = AttentionMaskVisualizer("google/paligemma2-3b-mix-224")
visualizer("
What is in this image?")
```
+
+

+
+
## Notes
- PaliGemma is not a conversational model and works best when fine-tuned for specific downstream tasks such as image captioning, visual question answering (VQA), object detection, and document understanding.