Fixing tests for Perceiver (#14745)

- Do not run image-classification pipeline (_CHECKPOINT_FOR_DOC uses the checkpoint for langage, which cannot load a FeatureExtractor so current logic fails). - Add a safeguard to not run tests when `tokenizer_class` or `feature_extractor_class` **are** defined, but cannot be loaded This happens for Perceiver for the "FastTokenizer" (which doesn't exist so None) and FeatureExtractor (which does exist but cannot be loaded because the checkpoint doesn't define one which is reasonable for the said checkpoint) - Added `get_vocab` function to `PerceiverTokenizer` since it is used by `fill-mask` pipeline when the argument `targets` is used to narrow a subset of possible values. Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2021-12-13 14:13:39 +01:00
parent 4c99e553c1
commit 3d66146afc
4 changed files with 27 additions and 3 deletions
--- a/tests/test_pipelines_common.py
+++ b/tests/test_pipelines_common.py
@@ -169,6 +169,11 @@ class PipelineTestCaseMeta(type):
                else:
                    tokenizer = None
                feature_extractor = get_tiny_feature_extractor_from_checkpoint(checkpoint, tiny_config)
+
+                if tokenizer is None and feature_extractor is None:
+                    self.skipTest(
+                        f"Ignoring {ModelClass}, cannot create a tokenizer or feature_extractor (PerceiverConfig with no FastTokenizer ?)"
+                    )
                pipeline, examples = self.get_test_pipeline(model, tokenizer, feature_extractor)
                if pipeline is None:
                    # The test can disable itself, but it should be very marginal
@@ -213,6 +218,7 @@ class PipelineTestCaseMeta(type):
                        if not tokenizer_classes:
                            # We need to test even if there are no tokenizers.
                            tokenizer_classes = [None]
+
                        for tokenizer_class in tokenizer_classes:
                            if tokenizer_class is not None:
                                tokenizer_name = tokenizer_class.__name__