@@ -164,6 +164,10 @@ visualizer = AttentionMaskVisualizer("google/gemma-3-4b-it")
|
|||||||
visualizer("<img>What is shown in this image?")
|
visualizer("<img>What is shown in this image?")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/gemma-3-attn-mask.png"/>
|
||||||
|
</div>
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Use [`Gemma3ForConditionalGeneration`] for image-and-text and image-only inputs.
|
- Use [`Gemma3ForConditionalGeneration`] for image-and-text and image-only inputs.
|
||||||
|
|||||||
@@ -116,6 +116,10 @@ visualizer = AttentionMaskVisualizer("huggyllama/llama-7b")
|
|||||||
visualizer("Plants create energy through a process known as")
|
visualizer("Plants create energy through a process known as")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llama-attn-mask.png"/>
|
||||||
|
</div>
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- The tokenizer is a byte-pair encoding model based on [SentencePiece](https://github.com/google/sentencepiece). During decoding, if the first token is the start of the word (for example, "Banana"), the tokenizer doesn't prepend the prefix space to the string.
|
- The tokenizer is a byte-pair encoding model based on [SentencePiece](https://github.com/google/sentencepiece). During decoding, if the first token is the start of the word (for example, "Banana"), the tokenizer doesn't prepend the prefix space to the string.
|
||||||
|
|||||||
@@ -116,6 +116,10 @@ visualizer = AttentionMaskVisualizer("meta-llama/Llama-2-7b-hf")
|
|||||||
visualizer("Plants create energy through a process known as")
|
visualizer("Plants create energy through a process known as")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/llama-2-attn-mask.png"/>
|
||||||
|
</div>
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- Setting `config.pretraining_tp` to a value besides `1` activates a more accurate but slower computation of the linear layers. This matches the original logits better.
|
- Setting `config.pretraining_tp` to a value besides `1` activates a more accurate but slower computation of the linear layers. This matches the original logits better.
|
||||||
|
|||||||
@@ -125,6 +125,10 @@ visualizer = AttentionMaskVisualizer("google/paligemma2-3b-mix-224")
|
|||||||
visualizer("<img> What is in this image?")
|
visualizer("<img> What is in this image?")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/paligemma2-attn-mask.png"/>
|
||||||
|
</div>
|
||||||
|
|
||||||
## Notes
|
## Notes
|
||||||
|
|
||||||
- PaliGemma is not a conversational model and works best when fine-tuned for specific downstream tasks such as image captioning, visual question answering (VQA), object detection, and document understanding.
|
- PaliGemma is not a conversational model and works best when fine-tuned for specific downstream tasks such as image captioning, visual question answering (VQA), object detection, and document understanding.
|
||||||
|
|||||||
Reference in New Issue
Block a user