Add voxtral (#39429)

* draft

* draft update (conversion working)

* mend

* draft update

* draft update: working generate

* refactor

* VoxtralProcessor draft

* processor update

* update convert_tekken_tokenizer

* refactor processor

* update convert

* make style

* better handle prefil

* make style

* add tests

* add mistral_common audio loading

* processor update

* revert changes

* audio utils update

* add audio to apply chat template mistral update

* voxtral processor update

* fix

* udpate converstion script

* make mistral tokenier from pretrain work from local dir

* fix udpates

* add integration tests

* add batched version

* processor docstring

* make style

* revert convert_tekken_tokenizer changes

* revert processing_qwen2.5 changes

* add multi-turn test

* processor improvements

* address review changes

* Update src/transformers/tokenization_mistral_common.py

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* update audio utils

* nits

* integration test update

* correct _support

* update tests

* test update

* update integration tests

* fix

* fix

* fix

* add test_apply_chat_template_with_audio

* add model doc

* model doc

* nit

* doc uptade

* nit

* processor improvement

* ensure default is 3B

* nits

* make

* make

* convert modular

* update checkpoint

* fix test

* make

* make

* autos

* make

* make

* nit

* nit

* nit

---------

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
eustlb
2025-07-18 02:02:04 +02:00
committed by GitHub
parent 73869f2e81
commit 967045082f
17 changed files with 2806 additions and 7 deletions

File diff suppressed because one or more lines are too long