Add SigLIP (#26522)

* Add first draft * Use appropriate gelu function * More improvements * More improvements * More improvements * Convert checkpoint * More improvements * Improve docs, remove print statements * More improvements * Add link * remove unused masking function * begin tokenizer * do_lower_case * debug * set split_special_tokens=True * Remove script * Fix style * Fix rebase * Use same design as CLIP * Add fast tokenizer * Add SiglipTokenizer to init, remove extra_ids * Improve conversion script * Use smaller inputs in conversion script * Update conversion script * More improvements * Add processor to conversion script * Add tests * Remove print statements * Add tokenizer tests * Fix more tests * More improvements related to weight initialization * More improvements * Make more tests pass * More improvements * More improvements * Add copied from * Add canonicalize_text * Enable fast tokenizer tests * More improvements * Fix most slow tokenizer tests * Address comments * Fix style * Remove script * Address some comments * Add copied from to tests * Add more copied from * Add more copied from * Add more copied from * Remove is_flax_available * More updates * Address comment * Remove SiglipTokenizerFast for now * Add caching * Remove umt5 test * Add canonicalize_text inside _tokenize, thanks Arthur * Fix image processor tests * Skip tests which are not applicable * Skip test_initialization * More improvements * Compare pixel values * Fix doc tests, add integration test * Add do_normalize * Remove causal mask and leverage ignore copy * Fix attention_mask * Fix remaining tests * Fix dummies * Rename temperature and bias * Address comments * Add copied from to tokenizer tests * Add SiglipVisionModel to auto mapping * Add copied from to image processor tests * Improve doc * Remove SiglipVisionModel from index * Address comments * Improve docs * Simplify config * Add first draft * Make it like mistral * More improvements * Fix attention_mask * Fix output_attentions * Add note in docs * Convert multilingual model * Convert large checkpoint * Convert more checkpoints * Add pipeline support, correct image_mean and image_std * Use padding=max_length by default * Make processor like llava * Add code snippet * Convert more checkpoints * Set keep_punctuation_string=None as in OpenCLIP * Set normalized=False for special tokens * Fix doc test * Update integration test * Add figure * Update organization * Happy new year * Use AutoModel everywhere --------- Co-authored-by: patil-suraj <surajp815@gmail.com>
2024-01-08 18:17:16 +01:00
parent 73c88012b7
commit 3b742ea84c
35 changed files with 4254 additions and 4 deletions
--- a/tests/pipelines/test_pipelines_zero_shot_image_classification.py
+++ b/tests/pipelines/test_pipelines_zero_shot_image_classification.py
@@ -241,3 +241,37 @@ class ZeroShotImageClassificationPipelineTests(unittest.TestCase):
            ]
            * 5,
        )
+
+    @slow
+    @require_torch
+    def test_siglip_model_pt(self):
+        image_classifier = pipeline(
+            task="zero-shot-image-classification",
+            model="google/siglip-base-patch16-224",
+        )
+        # This is an image of 2 cats with remotes and no planes
+        image = Image.open("./tests/fixtures/tests_samples/COCO/000000039769.png")
+        output = image_classifier(image, candidate_labels=["2 cats", "a plane", "a remote"])
+
+        self.assertEqual(
+            nested_simplify(output),
+            [
+                {"score": 0.198, "label": "2 cats"},
+                {"score": 0.0, "label": "a remote"},
+                {"score": 0.0, "label": "a plane"},
+            ],
+        )
+
+        output = image_classifier([image] * 5, candidate_labels=["2 cats", "a plane", "a remote"], batch_size=2)
+
+        self.assertEqual(
+            nested_simplify(output),
+            [
+                [
+                    {"score": 0.198, "label": "2 cats"},
+                    {"score": 0.0, "label": "a remote"},
+                    {"score": 0.0, "label": "a plane"},
+                ]
+            ]
+            * 5,
+        )