fix multi-image case for llava-onevision (#38084)
* _get_padding_size module * do not patchify images when processing multi image * modify llava onevision image processor fast * tensor to list of tensors * backward compat * reuse pad_to_square in llave & some clarification * add to doc * fix: consider no image cases (text only or video) * add integration test * style & repo_consistency
This commit is contained in:
@@ -147,7 +147,7 @@ print(processor.decode(output[0], skip_special_tokens=True))
|
||||
|
||||
### Multi image inference
|
||||
|
||||
LLaVa-OneVision can perform inference with multiple images as input, where images either belong to the same prompt or different prompts (in batched inference). For that you have to use checkpoints with an "ov" suffix. Here is how you can do it:
|
||||
LLaVa-OneVision can perform inference with multiple images as input, where images either belong to the same prompt or different prompts (in batched inference). For that you have to use checkpoints with an "ov" suffix. For multi-image cases, we recommend using a **nested list of images** as input. Otherwise, every image will be patchified and consume a lot of memory. Here is how you can do it:
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
||||
Reference in New Issue
Block a user