[VLMs] use only xxx_token_id for multimodal tokens (#37573)

* use only `xxx_token_id` for multimodal tokens * update modeling files as well * fixup * why fixup doesn't fix modular docstring first? * janus, need to update configs in the hub still * last fixup
2025-04-18 17:03:39 +02:00
parent 4afd3f4820
commit 2ba6b92a6f
63 changed files with 279 additions and 141 deletions
--- a/tests/generation/test_utils.py
+++ b/tests/generation/test_utils.py
@@ -224,12 +224,9 @@ class GenerationTesterMixin:
        # to crash. On pretrained models this isn't a risk, as they are trained to not generate these tokens.
        if config is not None:
            for key in [
-                "image_token_index",
                "image_token_id",
-                "video_token_index",
                "video_token_id",
                "vision_start_token_id",
-                "audio_token_index",
                "audio_start_token_id",
                "audio_end_token_id",
                "vision_end_token_id",