Uniformize model processors (#31368)

* add initial design for uniform processors + align model

* add uniform processors for altclip + chinese_clip

* add uniform processors for blip + blip2

* fix mutable default 👀

* add configuration test

* handle structured kwargs w defaults + add test

* protect torch-specific test

* fix style

* fix

* rebase

* update processor to generic kwargs + test

* fix style

* add sensible kwargs merge

* update test

* fix assertEqual

* move kwargs merging to processing common

* rework kwargs for type hinting

* just get Unpack from extensions

* run-slow[align]

* handle kwargs passed as nested dict

* add from_pretrained test for nested kwargs handling

* [run-slow]align

* update documentation + imports

* update audio inputs

* protect audio types, silly

* try removing imports

* make things simpler

* simplerer

* move out kwargs test to common mixin

* [run-slow]align

* skip tests for old processors

* [run-slow]align, clip

* !$#@!! protect imports, darn it

* [run-slow]align, clip

* [run-slow]align, clip

* update common processor testing

* add altclip

* add chinese_clip

* add pad_size

* [run-slow]align, clip, chinese_clip, altclip

* remove duplicated tests

* fix

* add blip, blip2, bridgetower

Added tests for bridgetower which override common. Also modified common
tests to force center cropping if existing

* fix

* update doc

* improve documentation for default values

* add model_max_length testing

This parameter depends on tokenizers received.

* Raise if kwargs are specified in two places

* fix

* removed copied from

* match defaults

* force padding

* fix tokenizer test

* clean defaults

* move tests to common

* add missing import

* fix

* adapt bridgetower tests to shortest edge

* uniformize donut processor + tests

* add wav2vec2

* extend common testing to audio processors

* add testing + bert version

* propagate common kwargs to different modalities

* BC order of arguments

* check py version

* revert kwargs merging

* add draft overlap test

* update

* fix blip2 and wav2vec due to updates

* fix copies

* ensure overlapping kwargs do not disappear

* replace .pop by .get to handle duplicated kwargs

* fix copies

* fix missing import

* add clearly wav2vec2_bert to uniformized models

* fix copies

* increase number of features

* fix style

* [run-slow] blip, blip2, bridgetower, donut, wav2vec2, wav2vec2_bert

* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

* fix concatenation

* [run-slow] blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

* Update tests/test_processing_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* 🧹

* address comments

* clean up + tests

* [run-slow] instructblip, blip, blip_2, bridgetower, donut, wav2vec2, wav2vec2_bert

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit is contained in:
Pablo Montalvo
2024-10-02 10:41:08 +02:00
committed by GitHub
parent 2292be6c1b
commit 50290cf7a0
18 changed files with 769 additions and 273 deletions

View File

@@ -14,16 +14,32 @@
# limitations under the License.
import tempfile
import unittest
from transformers import DonutProcessor
from transformers import DonutImageProcessor, DonutProcessor, XLMRobertaTokenizerFast
from transformers.testing_utils import (
require_torch,
require_vision,
)
from ...test_processing_common import ProcessorTesterMixin
class DonutProcessorTest(unittest.TestCase):
class DonutProcessorTest(ProcessorTesterMixin, unittest.TestCase):
from_pretrained_id = "naver-clova-ix/donut-base"
processor_class = DonutProcessor
def setUp(self):
self.processor = DonutProcessor.from_pretrained(self.from_pretrained_id)
self.tmpdirname = tempfile.mkdtemp()
image_processor = DonutImageProcessor()
tokenizer = XLMRobertaTokenizerFast.from_pretrained(self.from_pretrained_id)
processor = DonutProcessor(image_processor, tokenizer)
processor.save_pretrained(self.tmpdirname)
def test_token2json(self):
expected_json = {
@@ -49,3 +65,30 @@ class DonutProcessorTest(unittest.TestCase):
actual_json = self.processor.token2json(sequence)
self.assertDictEqual(actual_json, expected_json)
@require_torch
@require_vision
def test_unstructured_kwargs_batched(self):
if "image_processor" not in self.processor_class.attributes:
self.skipTest(f"image_processor attribute not present in {self.processor_class}")
image_processor = self.get_component("image_processor")
tokenizer = self.get_component("tokenizer")
if not tokenizer.pad_token:
tokenizer.pad_token = "[TEST_PAD]"
processor = self.processor_class(tokenizer=tokenizer, image_processor=image_processor)
self.skip_processor_without_typed_kwargs(processor)
input_str = ["lower newer", "upper older longer string"]
image_input = self.prepare_image_inputs() * 2
inputs = processor(
text=input_str,
images=image_input,
return_tensors="pt",
crop_size={"height": 214, "width": 214},
size={"height": 214, "width": 214},
padding="longest",
max_length=76,
)
self.assertEqual(inputs["pixel_values"].shape[2], 214)
self.assertEqual(len(inputs["input_ids"][0]), 7)