Add MM Grounding DINO (#37925)

* first commit Added modular implementation for MM Grounding DINO from starting point created by add-new-model-like. Added conversion script from mmdetection to huggingface. TODO: Some tests are failing so that needs to be fixed. * fixed a bug with modular definition of MMGroundingDinoForObjectDetection where box and class heads were not correctly assigned to inner model * cleaned up a hack in the conversion script * Fixed the expected values in integration tests Cross att masking and cpu-gpu consistency tests are still failing however. * changes for make style and quality * add documentation * clean up contrastive embedding * add mm grounding dino to loss mapping * add model link to config docstring * hack fix for mm grounding dino consistency tests * add special cases for unused config attr check * add all models and update docs * update model doc to the new style * Use super_kwargs for modular config * Move init to the _init_weights function * Add copied from for tests * fixup * update typehints * Fix-copies for tests * fix-copies * Fix init test * fix snippets in docs * fix consistency * fix consistency * update conversion script * fix nits in readme and remove old comments from conversion script * add license * remove unused config args * remove unnecessary if/else in model init * fix quality * Update references * fix test * fixup --------- Co-authored-by: qubvel <qubvel@gmail.com>
2025-08-01 16:43:23 +02:00
parent 50145474b7
commit 3951d4ad5d
17 changed files with 4884 additions and 1 deletions
--- a/tests/models/grounding_dino/test_modeling_grounding_dino.py
+++ b/tests/models/grounding_dino/test_modeling_grounding_dino.py
@@ -818,7 +818,9 @@ class GroundingDinoModelIntegrationTests(unittest.TestCase):
        prompt = ". ".join(id2label.values()) + "."

        text_inputs = tokenizer([prompt, prompt], return_tensors="pt")
-        image_inputs = image_processor(images=ds["image"], annotations=ds["annotations"], return_tensors="pt")
+        image_inputs = image_processor(
+            images=list(ds["image"]), annotations=list(ds["annotations"]), return_tensors="pt"
+        )

        # Passing auxiliary_loss=True to compare with the expected loss
        model = GroundingDinoForObjectDetection.from_pretrained(