fix multi-image case for llava-onevision (#38084)

* _get_padding_size module * do not patchify images when processing multi image * modify llava onevision image processor fast * tensor to list of tensors * backward compat * reuse pad_to_square in llave & some clarification * add to doc * fix: consider no image cases (text only or video) * add integration test * style & repo_consistency
2025-05-21 18:50:46 +09:00
parent a21f11fca2
commit 101b3fa4ea
13 changed files with 620 additions and 93 deletions
--- a/tests/test_image_processing_common.py
+++ b/tests/test_image_processing_common.py
@@ -233,7 +233,7 @@ class ImageProcessingTestMixin:
            avg_time = sum(sorted(all_times[:3])) / 3.0
            return avg_time

-        dummy_images = torch.randint(0, 255, (4, 3, 224, 224), dtype=torch.uint8)
+        dummy_images = [torch.randint(0, 255, (3, 224, 224), dtype=torch.uint8) for _ in range(4)]
        image_processor_slow = self.image_processing_class(**self.image_processor_dict)
        image_processor_fast = self.fast_image_processing_class(**self.image_processor_dict)