Fixes to chameleon docs (#32078)

* Fixes

* Let's not use auto
This commit is contained in:
Merve Noyan
2024-07-19 14:50:34 +03:00
committed by GitHub
parent 566b0f1fbf
commit 4bd8f12972

View File

@@ -34,13 +34,13 @@ being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs
generation, all in a single model. It also matches or exceeds the performance of much larger models, generation, all in a single model. It also matches or exceeds the performance of much larger models,
including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal
generation evaluation, where either the prompt or outputs contain mixed sequences of both images and generation evaluation, where either the prompt or outputs contain mixed sequences of both images and
text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents* text. Chameleon marks a significant step forward in unified modeling of full multimodal documents*
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/chameleon_arch.png" <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/chameleon_arch.png"
alt="drawing" width="600"/> alt="drawing" width="600"/>
<small> Chameleon incorporates a vector quantizer module to transform images into discrete tokens. That also enables image geenration using an auto-regressive transformer. Taken from the <a href="https://arxiv.org/abs/2405.09818v1">original paper.</a> </small> <small> Chameleon incorporates a vector quantizer module to transform images into discrete tokens. That also enables image generation using an auto-regressive transformer. Taken from the <a href="https://arxiv.org/abs/2405.09818v1">original paper.</a> </small>
This model was contributed by [joaogante](https://huggingface.co/joaogante) and [RaushanTurganbay](https://huggingface.co/RaushanTurganbay). This model was contributed by [joaogante](https://huggingface.co/joaogante) and [RaushanTurganbay](https://huggingface.co/RaushanTurganbay).
The original code can be found [here](https://github.com/facebookresearch/chameleon). The original code can be found [here](https://github.com/facebookresearch/chameleon).
@@ -61,6 +61,7 @@ The original code can be found [here](https://github.com/facebookresearch/chamel
### Single image inference ### Single image inference
Chameleon is a gated model so make sure to have access and login to Hugging Face Hub using a token.
Here's how to load the model and perform inference in half-precision (`torch.float16`): Here's how to load the model and perform inference in half-precision (`torch.float16`):
```python ```python
@@ -70,7 +71,7 @@ from PIL import Image
import requests import requests
processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b") processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")
model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="auto") model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="cuda")
# prepare image and text prompt # prepare image and text prompt
url = 'http://images.cocodataset.org/val2017/000000039769.jpg' url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
@@ -95,7 +96,8 @@ from PIL import Image
import requests import requests
processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b") processor = ChameleonProcessor.from_pretrained("facebook/chameleon-7b")
model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="auto")
model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", torch_dtype=torch.float16, device_map="cuda")
# Get three different images # Get three different images
url = "https://www.ilankelman.org/stopsigns/australia.jpg" url = "https://www.ilankelman.org/stopsigns/australia.jpg"
@@ -138,7 +140,7 @@ quantization_config = BitsAndBytesConfig(
bnb_4bit_compute_dtype=torch.float16, bnb_4bit_compute_dtype=torch.float16,
) )
model = ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", quantization_config=quantization_config, device_map="auto") model = ChameleonForConditionalGeneration.from_pretrained("facebook/chameleon-7b", quantization_config=quantization_config, device_map="cuda")
``` ```
### Use Flash-Attention 2 and SDPA to further speed-up generation ### Use Flash-Attention 2 and SDPA to further speed-up generation
@@ -148,6 +150,7 @@ The models supports both, Flash-Attention 2 and PyTorch's [`torch.nn.functional.
```python ```python
from transformers import ChameleonForConditionalGeneration from transformers import ChameleonForConditionalGeneration
model_id = "facebook/chameleon-7b"
model = ChameleonForConditionalGeneration.from_pretrained( model = ChameleonForConditionalGeneration.from_pretrained(
model_id, model_id,
torch_dtype=torch.float16, torch_dtype=torch.float16,