[VLMs] use only xxx_token_id for multimodal tokens (#37573)

* use only `xxx_token_id` for multimodal tokens

* update modeling files as well

* fixup

* why fixup doesn't fix modular docstring first?

* janus, need to update configs in the hub still

* last fixup
This commit is contained in:
Raushan Turganbay
2025-04-18 17:03:39 +02:00
committed by GitHub
parent 4afd3f4820
commit 2ba6b92a6f
63 changed files with 279 additions and 141 deletions

View File

@@ -224,12 +224,9 @@ class GenerationTesterMixin:
# to crash. On pretrained models this isn't a risk, as they are trained to not generate these tokens.
if config is not None:
for key in [
"image_token_index",
"image_token_id",
"video_token_index",
"video_token_id",
"vision_start_token_id",
"audio_token_index",
"audio_start_token_id",
"audio_end_token_id",
"vision_end_token_id",