Add Phi4 multimodal (#36939)

* raw start

* update

* update

* add to imports

* update

* up

* simplify configs

* clean configs

* style

* typos

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* fix

* up

* up

* up

* Update convert_phi4_multimodal_weights_to_hf.py

* Update convert_phi4_multimodal_weights_to_hf.py

* up

* up

* up

* Update feature_extraction_phi4_multimodal.py

* up

* up

* up

* up

* up

* simplify configs

* typo

* cut code

* typo

* typo

* typo

* re

* typo

* up

* up

* up

* add tests

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* up

* Update test_modeling_phi4_multimodal.py

* doc

* fix

* up

* up

* up

* up

* up

* up

* simplify

* up

* simplify

* config docstrings

* cleanup

* clean

* typo

* typo

* fix

* Update phi4_multimodal.md

* fix

* fix

* Update test_modeling_phi4_multimodal.py

* update

* simplify reshapes and permutes

* up

* simplify special tokens

* simplify processor a lot

* Update processing_phi4_multimodal.py

* Update processing_phi4_multimodal.py

* switch to fast processor

* image processor

* Update image_processing_phi4_multimodal_fast.py

* add lora extraction to converter

* Update convert_phi4_multimodal_weights_to_hf.py

* Update __init__.py

* add AudioInput type in audio_utils

* rewrite feature_extraction: support torch batched FFT

* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values

* test update

* not mono channel warning update

* remove auto maps from processor

* kargs dispatch in processor

* simplify kwargs dispatch

* simplify merging

* remove default sampling rate

* style

* Update test_modeling_phi4_multimodal.py

* update doc

* doc

* torch only feature extractor

* make fake tokens adjustable

* Update feature_extraction_phi4_multimodal.py

* fix

* Update processing_phi4_multimodal.py

* simplify mask

* last touch

* fix copies

* style

* Update audio_utils.py

* style

* Update feature_extraction_phi4_multimodal.py

* Update __init__.py

* docstrings

* copies

* fix all checks

* back to fix-copies

* trigger CIs

* Update feature_extraction_phi4_multimodal.py

* improve tests with multimodal inputs

* trigger CIs

---------

Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>

This commit is contained in:

Cyril Vallez

2025-03-25 09:55:21 +01:00

committed by

GitHub

parent 47e5432805

commit 4303d88c09

24 changed files with 6380 additions and 1 deletions

									
										1

utils/check_docstrings.py
									
												View File
												
				@@ -524,6 +524,7 @@ OBJECTS_TO_IGNORE = [

				    "TimeSeriesTransformerConfig",

				    "TokenClassificationPipeline",

				    "TrOCRConfig",

				    "Phi4MultimodalProcessor",

				    "TrainerState",

				    "TrainingArguments",

				    "TrajectoryTransformerConfig",

									
										2

utils/check_repo.py
									
												View File
												
				@@ -89,6 +89,8 @@ PRIVATE_MODELS = [

				    "SmolVLMVisionTransformer",

				    "AriaTextForCausalLM",

				    "AriaTextModel",

				    "Phi4MultimodalAudioModel",

				    "Phi4MultimodalVisionModel",

				]

				# Update this list for models that are not tested with a comment explaining the reason it should not be.

Add Phi4 multimodal (#36939)

1 utils/check_docstrings.py Unescape Escape View File

2 utils/check_repo.py Unescape Escape View File

1

utils/check_docstrings.py

View File

2

utils/check_repo.py

View File