Llava: add default chat templates (#31691)
* add default chat templates * Update src/transformers/models/llava/processing_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/processing_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * more clear docstring and docs * Update docs/source/en/model_doc/llava.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Update docs/source/en/model_doc/vipllava.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * add tests * remove default templates (see #31733) * load chat template from another file * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * revert some changes in docs * forgot vipllava * chat template file is not temporary hack * warn if loading from processor * not that file * similarly modify `save_pretrained` * Update tests/models/llava_next/test_processor_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vipllava/test_processor_vipllava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/vipllava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/processing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/processing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/vipllava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/processing_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/llava_next.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
271fd8e60d
commit
b873234cb6
@@ -40,7 +40,42 @@ The original code can be found [here](https://github.com/haotian-liu/LLaVA/tree/
|
||||
|
||||
- Note the model has not been explicitly trained to process multiple images in the same prompt, although this is technically possible, you may experience inaccurate results.
|
||||
|
||||
- For better results, we recommend users to prompt the model with the correct prompt format. Below is a list of prompt formats accepted by each llava checkpoint:
|
||||
- For better results, we recommend users to use the processor's `apply_chat_template()` method to format your prompt correctly. For that you need to construct a conversation history, passing in a plain string will not format your prompt. Each message in the conversation history for chat templates is a dictionary with keys "role" and "content". The "content" should be a list of dictionaries, for "text" and "image" modalities, as follows:
|
||||
|
||||
```python
|
||||
from transformers import AutoProcessor
|
||||
|
||||
processor = AutoProcessor.from_pretrained("llava-hf/llava-1.5-7b-hf")
|
||||
|
||||
conversation = [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "image"},
|
||||
{"type": "text", "text": "What’s shown in this image?"},
|
||||
],
|
||||
},
|
||||
{
|
||||
"role": "assistant",
|
||||
"content": [{"type": "text", "text": "This image shows a red stop sign."},]
|
||||
},
|
||||
{
|
||||
|
||||
"role": "user",
|
||||
"content": [
|
||||
{"type": "text", "text": "Describe the image in more details."},
|
||||
],
|
||||
},
|
||||
]
|
||||
|
||||
text_prompt = processor.apply_chat_template(conversation, add_generation_prompt=True)
|
||||
|
||||
# Note that the template simply formats your prompt, you still have to tokenize it and obtain pixel values for your images
|
||||
print(text_prompt)
|
||||
>>> "USER: <image>\n<What’s shown in this image? ASSISTANT: This image shows a red stop sign.</s>USER: Describe the image in more details. ASSISTANT:"
|
||||
```
|
||||
|
||||
- If you want to construct a chat prompt yourself, below is a list of prompt formats accepted by each llava checkpoint:
|
||||
|
||||
[llava-interleave models](https://huggingface.co/collections/llava-hf/llava-interleave-668e19a97da0036aad4a2f19) requires the following format:
|
||||
```bash
|
||||
@@ -64,6 +99,7 @@ For multiple turns conversation:
|
||||
"USER: <image>\n<prompt1> ASSISTANT: <answer1></s>USER: <prompt2> ASSISTANT: <answer2></s>USER: <prompt3> ASSISTANT:"
|
||||
```
|
||||
|
||||
|
||||
### Using Flash Attention 2
|
||||
|
||||
Flash Attention 2 is an even faster, optimized version of the previous optimization, please refer to the [Flash Attention 2 section of performance docs](https://huggingface.co/docs/transformers/perf_infer_gpu_one).
|
||||
|
||||
Reference in New Issue
Block a user