Add ColQwen2 to 🤗 transformers (#35778)

* feat: add colqwen2 (wip) * tests: fix test_attention_outputs * tests: reduce hidden size to accelerate tests * tests: fix `test_attention_outputs` 🥳 * fix: fix wrong parent class for `ColQwen2ForRetrievalOutput` * fix: minor typing and style changes * chore: run `make style` * feat: remove redundant `max_num_visual_tokens` attribute in `ColQwen2Processor` * tests: tweak comments * style: apply ruff formatter * feat: move default values for `visual_prompt_prefix` and `query_prefix` * docs: update ColQwen2 model card * docs: tweak model cards * docs: add required example config checkpoint * tests: update expected scores in integration test * docs: tweak quickstart snippets * fix: address PR comments * tests: fix colqwen2 tests + tweak comment in colpali test * tests: unskip useful tests * fix: fix bug when `visual_prompt_prefix` or `query_prefix` is an empty string * fix: fix ColPali outputs when `return_dict == False` * fix: fix issue with PaliGemma output not being a dict * docs: set default dtype to bfloat16 in quickstart snippets * fix: fix error when `return_dict=False` in ColPali and ColQwen2 * tests: fix special tokens not being replaced in input_ids * style: fix lint * fix: `ColQwen2Processor`'s `padding_side` is now set from `processor_config.json` * fix: remove unused `padding_side` in ColQwen2 model * docs: update ColQwen2's model doc * fix: fix harcoded vlm backbone class in ColQwen2Config * fix: remove `padding_side` from ColQwen2Processor as should fed from kwargs * docs: fix typo in model docstring * docs: add illuin mention in model docs * fix: let `padding_size` be handled by `tokenizer_config.json` * docs: add colpali reference url in colqwen2's model doc * docs: add Hf mention in model docs * docs: add late interaction mention in model docs * docs: tweak colqwen2 model doc * docs: update reference checkpoint for ColPali to v1.3 * docs: simplify quickstart snippets * docs: remove redundant `.eval()` * refactor: use `can_return_tuple` decorator for ColPali and ColQwen2 * docs: fix copyright date * docs: add missing copyright in tests * fix: raise error when `initializer_range` is not in config * docs: remove redundant `.eval()` in colpali doc * fix: fix `get_text_config` now that Qwen2VL has a proper `text_config` attribute See https://github.com/huggingface/transformers/pull/37268 for details about changes in Qwen2VL's config. * fix: add missing `initializer_range` attribute in `ColQwen2Config` * fix: use `get_text_config` in `resize_token_embeddings` * update colwen2 with auto_docstring * docs: fix wrong copyright year * chore: remove `raise` as `initializer_range` has a default value in `ColQwen2Config` * refactor: merge `inner_forward` into `forward` * Refactor colqwen2 after refactoring of qwen2VL, use modular for modeling code * protect torch import in modular to protect in processing * protect torch import in modular to protect in processing * tests: fix hf model path in ColQwen2 integration test * docs: clarify `attn_implementation` and add comments * docs: add fallback snippet for using offline PIL dummy images * docs: temporarily revert attn_implementation to `None` while sdpa is not fixed * docs: tweaks in colpali/colqwen2 quick start snippets * fix: add missing flags to enable SDPA/Flex Attention in ColQwen2 model * fix: add missing changes in modular file * fix modeling tests --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
2025-06-02 14:58:01 +02:00
parent beaed8ce01
commit c72ba69441
23 changed files with 2289 additions and 95 deletions
--- a/tests/models/colpali/test_modeling_colpali.py
+++ b/tests/models/colpali/test_modeling_colpali.py
@@ -168,7 +168,6 @@ class ColPaliForRetrievalModelTester:
            "input_ids": input_ids,
            "attention_mask": attention_mask,
            "labels": input_ids,
-            "token_type_ids": torch.zeros_like(input_ids),
        }
        return config, inputs_dict

@@ -333,7 +332,7 @@ class ColPaliModelIntegrationTest(unittest.TestCase):
        scores = self.processor.score_retrieval(
            query_embeddings=query_embeddings,
            passage_embeddings=image_embeddings,
-        )  # (len(qs), len(ps))
+        )  # (num_queries, num_passages)

        assert scores.ndim == 2, f"Expected 2D tensor, got {scores.ndim}"
        assert scores.shape == (len(ds), len(ds)), f"Expected shape {(len(ds), len(ds))}, got {scores.shape}"
--- a/tests/models/colpali/test_processing_colpali.py
+++ b/tests/models/colpali/test_processing_colpali.py
@@ -1,3 +1,18 @@
+# Copyright 2024 HuggingFace Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Testing suite for the ColPali processor."""
+
 import shutil
 import tempfile
 import unittest
@@ -89,7 +104,7 @@ class ColPaliProcessorTest(ProcessorTesterMixin, unittest.TestCase):
        self.assertIsInstance(batch_feature["input_ids"], torch.Tensor)
        self.assertEqual(batch_feature["input_ids"].shape[0], len(queries))

-        # The following tests are overwritten as ColPaliProcessor can only take one of images or text as input at a time
+    # The following tests override the parent tests because ColPaliProcessor can only take one of images or text as input at a time.

    def test_tokenizer_defaults_preserved_by_kwargs(self):
        if "image_processor" not in self.processor_class.attributes: