fix multi-image case for llava-onevision (#38084)

* _get_padding_size module

* do not patchify images when processing multi image

* modify llava onevision image processor fast

* tensor to list of tensors

* backward compat

* reuse pad_to_square in llave & some clarification

* add to doc

* fix: consider no image cases (text only or video)

* add integration test

* style & repo_consistency
This commit is contained in:
youngrok cha
2025-05-21 18:50:46 +09:00
committed by GitHub
parent a21f11fca2
commit 101b3fa4ea
13 changed files with 620 additions and 93 deletions

View File

@@ -233,7 +233,7 @@ class ImageProcessingTestMixin:
avg_time = sum(sorted(all_times[:3])) / 3.0
return avg_time
dummy_images = torch.randint(0, 255, (4, 3, 224, 224), dtype=torch.uint8)
dummy_images = [torch.randint(0, 255, (3, 224, 224), dtype=torch.uint8) for _ in range(4)]
image_processor_slow = self.image_processing_class(**self.image_processor_dict)
image_processor_fast = self.fast_image_processing_class(**self.image_processor_dict)