Cyril Vallez
4303d88c09
Add Phi4 multimodal (#36939)
* raw start
* update
* update
* add to imports
* update
* up
* simplify configs
* clean configs
* style
* typos
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* fix
* up
* up
* up
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* up
* up
* up
* Update feature_extraction_phi4_multimodal.py
* up
* up
* up
* up
* up
* simplify configs
* typo
* cut code
* typo
* typo
* typo
* re
* typo
* up
* up
* up
* add tests
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* up
* Update test_modeling_phi4_multimodal.py
* doc
* fix
* up
* up
* up
* up
* up
* up
* simplify
* up
* simplify
* config docstrings
* cleanup
* clean
* typo
* typo
* fix
* Update phi4_multimodal.md
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* update
* simplify reshapes and permutes
* up
* simplify special tokens
* simplify processor a lot
* Update processing_phi4_multimodal.py
* Update processing_phi4_multimodal.py
* switch to fast processor
* image processor
* Update image_processing_phi4_multimodal_fast.py
* add lora extraction to converter
* Update convert_phi4_multimodal_weights_to_hf.py
* Update __init__.py
* add AudioInput type in audio_utils
* rewrite feature_extraction: support torch batched FFT
* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values
* test update
* not mono channel warning update
* remove auto maps from processor
* kargs dispatch in processor
* simplify kwargs dispatch
* simplify merging
* remove default sampling rate
* style
* Update test_modeling_phi4_multimodal.py
* update doc
* doc
* torch only feature extractor
* make fake tokens adjustable
* Update feature_extraction_phi4_multimodal.py
* fix
* Update processing_phi4_multimodal.py
* simplify mask
* last touch
* fix copies
* style
* Update audio_utils.py
* style
* Update feature_extraction_phi4_multimodal.py
* Update __init__.py
* docstrings
* copies
* fix all checks
* back to fix-copies
* trigger CIs
* Update feature_extraction_phi4_multimodal.py
* improve tests with multimodal inputs
* trigger CIs
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
..
2025-01-24 16:55:28 +01:00
2025-03-05 15:04:06 -08:00
2025-03-21 10:20:05 +01:00
2024-09-19 19:28:04 +01:00
2024-03-19 14:43:02 +00:00
2025-03-12 09:08:12 +01:00
2025-03-24 12:36:08 +01:00
2025-03-25 09:55:21 +01:00
2025-03-19 18:29:40 +00:00
2025-03-03 18:01:43 +00:00
2025-03-18 10:31:13 +01:00
2025-03-24 14:08:29 +00:00
2025-02-13 12:53:21 +00:00
2025-03-21 10:20:05 +01:00
2025-03-17 17:45:57 +00:00
2025-02-13 12:00:33 +01:00
2025-03-24 16:57:17 +00:00
2025-03-24 14:08:29 +00:00
2023-12-20 18:33:17 +00:00
2025-03-20 15:15:01 +00:00
2023-06-15 07:30:24 -04:00
2025-03-14 12:15:32 +00:00
2025-02-21 18:38:41 +01:00
2025-03-20 15:15:01 +00:00
2025-03-05 15:04:06 -08:00
2025-03-17 17:45:57 +00:00
2024-10-31 15:48:11 -04:00
2025-03-24 14:08:29 +00:00
2023-09-05 10:12:25 +02:00
2025-03-21 10:20:05 +01:00
2025-03-17 16:09:46 +01:00