Cyril Vallez
4303d88c09
Add Phi4 multimodal (#36939)
* raw start
* update
* update
* add to imports
* update
* up
* simplify configs
* clean configs
* style
* typos
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* fix
* up
* up
* up
* Update convert_phi4_multimodal_weights_to_hf.py
* Update convert_phi4_multimodal_weights_to_hf.py
* up
* up
* up
* Update feature_extraction_phi4_multimodal.py
* up
* up
* up
* up
* up
* simplify configs
* typo
* cut code
* typo
* typo
* typo
* re
* typo
* up
* up
* up
* add tests
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* up
* Update test_modeling_phi4_multimodal.py
* doc
* fix
* up
* up
* up
* up
* up
* up
* simplify
* up
* simplify
* config docstrings
* cleanup
* clean
* typo
* typo
* fix
* Update phi4_multimodal.md
* fix
* fix
* Update test_modeling_phi4_multimodal.py
* update
* simplify reshapes and permutes
* up
* simplify special tokens
* simplify processor a lot
* Update processing_phi4_multimodal.py
* Update processing_phi4_multimodal.py
* switch to fast processor
* image processor
* Update image_processing_phi4_multimodal_fast.py
* add lora extraction to converter
* Update convert_phi4_multimodal_weights_to_hf.py
* Update __init__.py
* add AudioInput type in audio_utils
* rewrite feature_extraction: support torch batched FFT
* input_audio_embeds -> audio_input_features, input_image_embeds -> image_pixel_values
* test update
* not mono channel warning update
* remove auto maps from processor
* kargs dispatch in processor
* simplify kwargs dispatch
* simplify merging
* remove default sampling rate
* style
* Update test_modeling_phi4_multimodal.py
* update doc
* doc
* torch only feature extractor
* make fake tokens adjustable
* Update feature_extraction_phi4_multimodal.py
* fix
* Update processing_phi4_multimodal.py
* simplify mask
* last touch
* fix copies
* style
* Update audio_utils.py
* style
* Update feature_extraction_phi4_multimodal.py
* Update __init__.py
* docstrings
* copies
* fix all checks
* back to fix-copies
* trigger CIs
* Update feature_extraction_phi4_multimodal.py
* improve tests with multimodal inputs
* trigger CIs
---------
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
2025-03-25 09:55:21 +01:00
..
2025-03-11 13:47:38 +00:00
2025-03-24 14:08:29 +00:00
2025-03-25 09:55:21 +01:00
2025-03-24 14:08:29 +00:00
2025-03-11 13:47:38 +00:00
2024-11-04 09:40:30 -08:00
2025-03-24 14:08:29 +00:00
2025-03-19 18:29:40 +00:00
2025-03-14 09:31:18 -07:00
2025-03-11 13:47:38 +00:00
2025-03-24 14:08:29 +00:00
2024-12-17 09:32:00 -08:00
2023-11-08 08:35:20 -05:00
2025-03-19 18:29:40 +00:00
2024-04-08 14:21:16 +01:00