From befbbf2f98492e2164f185708e62c06fd30f75d1 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Tue, 19 Nov 2024 18:49:10 +0100 Subject: [PATCH] Added image-text-to-text pipeline to task guide (#34783) * Added image-text-to-text pipeline to task guide * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Update docs/source/en/tasks/image_text_to_text.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> * Merge codeblocks --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/tasks/image_text_to_text.md | 40 ++++++++++++++++++++++ 1 file changed, 40 insertions(+) diff --git a/docs/source/en/tasks/image_text_to_text.md b/docs/source/en/tasks/image_text_to_text.md index 261abf9472..041efb06c5 100644 --- a/docs/source/en/tasks/image_text_to_text.md +++ b/docs/source/en/tasks/image_text_to_text.md @@ -120,6 +120,46 @@ print(generated_texts) ## ['User: What do we see in this image? \nAssistant: In this image we can see two cats on the nets. \nUser: And how about this image? \nAssistant: In this image we can see flowers, plants and insect.'] ``` +## Pipeline + +The fastest way to get started is to use the [`Pipeline`] API. Specify the `"image-text-to-text"` task and the model you want to use. + +```python +from transformers import pipeline +pipe = pipeline("image-text-to-text", model="llava-hf/llava-interleave-qwen-0.5b-hf") +``` + +The example below uses chat templates to format the text inputs. + +```python +messages = [ + { + "role": "user", + "content": [ + { + "type": "image", + "image": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg", + }, + {"type": "text", "text": "Describe this image."}, + ], + }, + { + "role": "assistant", + "content": [ + {"type": "text", "text": "There's a pink flower"}, + ], + }, + ] +``` + +Pass the chat template formatted text and image to [`Pipeline`] and set `return_full_text=False` to remove the input from the generated output. + +```python +outputs = pipe(text=messages, max_new_tokens=20, return_full_text=False) +outputs[0]["generated_text"] +# with a yellow center in the foreground. The flower is surrounded by red and white flowers with green stems +``` + ## Streaming We can use [text streaming](./generation_strategies#streaming) for a better generation experience. Transformers supports streaming with the [`TextStreamer`] or [`TextIteratorStreamer`] classes. We will use the [`TextIteratorStreamer`] with IDEFICS-8B.