Cyril Vallez
4303d88c09
Add Phi4 multimodal (#36939)
* raw start
* update
* update
* add to imports
* update
* up
* simplify configs
* clean configs
* style
* typos
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* fix
* up
* up
* up
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* up
* up
* up
* Update feature_extraction_phi4_multimodal.py
* up
* up
* up
* up
* up
* simplify configs
* typo
* cut code
* typo
* typo
* typo
* re
* typo
* up
* up
* up
* add tests
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* up
* Update test_modeling_phi4_multimodal.py
* doc
* fix
* up
* up
* up
* up
* up
* up
* simplify
* up
* simplify
* config docstrings
* cleanup
* clean
* typo
* typo
* fix
* Update phi4_multimodal.md
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* update
* simplify reshapes and permutes
* up
* simplify special tokens
* simplify processor a lot
* Update processing_phi4_multimodal.py
* Update processing_phi4_multimodal.py
* switch to fast processor
* image processor
* Update image_processing_phi4_multimodal_fast.py
* add lora extraction to converter
* Update convert_phi4_multimodal_weights_to_hf.py
* Update __init__.py
* add AudioInput type in audio_utils
* rewrite feature_extraction: support torch batched FFT
* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values
* test update
* not mono channel warning update
* remove auto maps from processor
* kargs dispatch in processor
* simplify kwargs dispatch
* simplify merging
* remove default sampling rate
* style
* Update test_modeling_phi4_multimodal.py
* update doc
* doc
* torch only feature extractor
* make fake tokens adjustable
* Update feature_extraction_phi4_multimodal.py
* fix
* Update processing_phi4_multimodal.py
* simplify mask
* last touch
* fix copies
* style
* Update audio_utils.py
* style
* Update feature_extraction_phi4_multimodal.py
* Update __init__.py
* docstrings
* copies
* fix all checks
* back to fix-copies
* trigger CIs
* Update feature_extraction_phi4_multimodal.py
* improve tests with multimodal inputs
* trigger CIs
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
..
2022-11-08 19:54:41 +00:00
2021-02-15 07:55:10 -05:00
2024-05-22 06:40:15 +02:00
2024-11-28 15:34:38 +01:00
2025-03-21 13:08:47 +01:00
2025-03-20 15:14:38 +01:00
2024-12-11 12:40:30 +00:00
2025-03-13 15:12:44 +00:00
2024-05-22 06:40:15 +02:00
2025-03-25 09:55:21 +01:00
2024-05-22 06:40:15 +02:00
2025-03-20 17:37:29 +01:00
2025-03-13 15:12:44 +00:00
2023-03-13 19:11:19 +01:00
2025-03-20 10:55:12 +00:00
2025-03-25 09:55:21 +01:00
2023-06-06 18:17:41 +02:00
2021-02-15 07:55:10 -05:00
2025-03-06 13:12:30 +00:00
2024-08-27 11:58:27 +01:00
2025-03-13 15:12:44 +00:00
2025-03-13 15:12:44 +00:00
2024-05-22 06:40:15 +02:00
2024-04-15 15:08:09 +02:00
2024-01-31 15:58:17 +01:00
2023-02-28 17:12:44 +01:00
2023-02-03 12:57:02 -05:00
2024-10-17 16:11:52 +02:00
2024-08-27 11:58:27 +01:00
2024-04-12 10:01:28 +02:00
2024-05-22 06:40:15 +02:00
2025-03-18 14:00:54 -04:00
2025-03-21 13:08:47 +01:00
2024-04-15 13:20:36 +02:00
2024-06-20 18:57:24 +02:00
2025-01-21 17:56:43 +00:00
2023-03-30 21:06:35 +02:00
2024-10-09 09:21:46 +02:00
2025-02-24 17:53:18 +01:00
2022-06-02 10:24:16 +02:00
2024-10-28 12:01:05 +01:00
2024-12-16 11:06:17 +01:00
2024-09-03 16:53:21 +02:00
2025-03-11 13:47:38 +00:00
2024-06-10 15:16:58 +02:00
2024-05-09 22:57:52 +02:00
2024-05-22 06:40:15 +02:00
2025-03-13 15:12:44 +00:00
2024-04-24 22:32:42 +02:00
2025-03-13 15:12:44 +00:00
2025-03-13 15:12:44 +00:00
2024-07-22 14:14:47 +01:00