Make pipeline able to load processor (#32514)

* Refactor get_test_pipeline * Fixup * Fixing tests * Add processor loading in tests * Restructure processors loading * Add processor to the pipeline * Move model loading on tom of the test * Update `get_test_pipeline` * Fixup * Add class-based flags for loading processors * Change `is_pipeline_test_to_skip` signature * Skip t5 failing test for slow tokenizer * Fixup * Fix copies for T5 * Fix typo * Add try/except for tokenizer loading (kosmos-2 case) * Fixup * Llama not fails for long generation * Revert processor pass in text-generation test * Fix docs * Switch back to json file for image processors and feature extractors * Add processor type check * Remove except for tokenizers * Fix docstring * Fix empty lists for tests * Fixup * Fix load check * Ensure we have non-empty test cases * Update src/transformers/pipelines/__init__.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Update src/transformers/pipelines/base.py Co-authored-by: Lysandre Debut <hi@lysand.re> * Rework comment * Better docs, add note about pipeline components * Change warning to error raise * Fixup * Refine pipeline docs --------- Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-10-09 16:46:11 +01:00
parent 4fb28703ad
commit 48461c0fe2
91 changed files with 1312 additions and 241 deletions
--- a/tests/pipelines/test_pipelines_automatic_speech_recognition.py
+++ b/tests/pipelines/test_pipelines_automatic_speech_recognition.py
@@ -67,14 +67,27 @@ class AutomaticSpeechRecognitionPipelineTests(unittest.TestCase):
        + (MODEL_FOR_CTC_MAPPING.items() if MODEL_FOR_CTC_MAPPING else [])
    )

-    def get_test_pipeline(self, model, tokenizer, processor, torch_dtype="float32"):
+    def get_test_pipeline(
+        self,
+        model,
+        tokenizer=None,
+        image_processor=None,
+        feature_extractor=None,
+        processor=None,
+        torch_dtype="float32",
+    ):
        if tokenizer is None:
            # Side effect of no Fast Tokenizer class for these model, so skipping
            # But the slow tokenizer test should still run as they're quite small
            self.skipTest(reason="No tokenizer available")

        speech_recognizer = AutomaticSpeechRecognitionPipeline(
-            model=model, tokenizer=tokenizer, feature_extractor=processor, torch_dtype=torch_dtype
+            model=model,
+            tokenizer=tokenizer,
+            feature_extractor=feature_extractor,
+            image_processor=image_processor,
+            processor=processor,
+            torch_dtype=torch_dtype,
        )

        # test with a raw waveform