Chameleon: minor fixes after shipping (#32037)

* fix merging

* make chameleon conditional
This commit is contained in:
Raushan Turganbay
2024-07-18 16:54:07 +05:00
committed by GitHub
parent 765732e92c
commit 673d30b826
7 changed files with 38 additions and 31 deletions

View File

@@ -64,13 +64,13 @@ The original code can be found [here](https://github.com/facebookresearch/chamel
Here's how to load the model and perform inference in half-precision (`torch.float16`):
```python
from transformers import ChameleonProcessor, ChameleonForCausalLM
from transformers import ChameleonProcessor, ChameleonForConditionalGeneration
import torch
from PIL import Image
import requests
processor = ChameleonProcessor.from_pretrained("meta-chameleon")
model = ChameleonForCausalLM.from_pretrained("meta-chameleon", torch_dtype=torch.float16, device_map="auto")
model = ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", torch_dtype=torch.float16, device_map="auto")
# prepare image and text prompt
url = "https://bjiujitsu.com/wp-content/uploads/2021/01/jiu_jitsu_belt_white_1.jpg"
@@ -89,13 +89,13 @@ print(processor.decode(output[0], skip_special_tokens=True))
Chameleon can perform inference with multiple images as input, where images either belong to the same prompt or different prompts (in batched inference). Here is how you can do it:
```python
from transformers import ChameleonProcessor, ChameleonForCausalLM
from transformers import ChameleonProcessor, ChameleonForConditionalGeneration
import torch
from PIL import Image
import requests
processor = ChameleonProcessor.from_pretrained("meta-chameleon")
model = ChameleonForCausalLM.from_pretrained("meta-chameleon", torch_dtype=torch.float16, device_map="auto")
model = ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", torch_dtype=torch.float16, device_map="auto")
# Get three different images
url = "https://www.ilankelman.org/stopsigns/australia.jpg"
@@ -129,7 +129,7 @@ processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokeniza
The model can be loaded in 8 or 4 bits, greatly reducing the memory requirements while maintaining the performance of the original model. First make sure to install bitsandbytes, `pip install bitsandbytes` and make sure to have access to a CUDA compatible GPU device. Simply change the snippet above with:
```python
from transformers import ChameleonForCausalLM, BitsAndBytesConfig
from transformers import ChameleonForConditionalGeneration, BitsAndBytesConfig
# specify how to quantize the model
quantization_config = BitsAndBytesConfig(
@@ -138,7 +138,7 @@ quantization_config = BitsAndBytesConfig(
bnb_4bit_compute_dtype=torch.float16,
)
model = ChameleonForCausalLM.from_pretrained("meta-chameleon", quantization_config=quantization_config, device_map="auto")
model = ChameleonForConditionalGeneration.from_pretrained("meta-chameleon", quantization_config=quantization_config, device_map="auto")
```
### Use Flash-Attention 2 and SDPA to further speed-up generation
@@ -146,9 +146,9 @@ model = ChameleonForCausalLM.from_pretrained("meta-chameleon", quantization_conf
The models supports both, Flash-Attention 2 and PyTorch's [`torch.nn.functional.scaled_dot_product_attention`](https://pytorch.org/docs/master/generated/torch.nn.functional.scaled_dot_product_attention.html) which can be enables for optimization. SDPA is the default options when you load the model, If you want to switch for Flash Attention 2, first make sure to install flash-attn. Refer to the [original repository](https://github.com/Dao-AILab/flash-attention) regarding that package installation. Simply change the snippet above with:
```python
from transformers import ChameleonForCausalLM
from transformers import ChameleonForConditionalGeneration
model = ChameleonForCausalLM.from_pretrained(
model = ChameleonForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=torch.float16,
low_cpu_mem_usage=True,
@@ -183,7 +183,7 @@ model = ChameleonForCausalLM.from_pretrained(
[[autodoc]] ChameleonModel
- forward
## ChameleonForCausalLM
## ChameleonForConditionalGeneration
[[autodoc]] ChameleonForCausalLM
[[autodoc]] ChameleonForConditionalGeneration
- forward