[Whisper] Add large-v3 version support (#27336)

* Enable large-v3 downloading and update language list

* Fix type annotation

* make fixup

* Export Whisper feature extractor

* Fix error after extractor loading

* Do not use pre-computed mel filters

* Save the full preprocessor properly

* Update docs

* Remove comment

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add alignment heads consistent with each Whisper version

* Remove alignment heads calculation

* Save fast tokenizer format as well

* Fix slow to fast conversion

* Fix bos/eos/pad token IDs in the model config

* Add decoder_start_token_id to config

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
Dmitrii Mukhutdinov
2023-11-21 00:36:48 +08:00
committed by GitHub
parent 93f2de858b
commit 87e217d065
3 changed files with 100 additions and 29 deletions

View File

@@ -34,13 +34,13 @@ The original code can be found [here](https://github.com/openai/whisper).
- Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.
- One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text.
- To convert the tokenizer, we recommend using the following:
- To convert the model and the processor, we recommend using the following:
```bash
python src/transformers/models/whisper/convert_openai_to_hf.py --checkpoint_path "" --pytorch_dump_folder_path "Arthur/whisper-3" --convert_tokenizer True --whisper_version 3 --multilingual True
python src/transformers/models/whisper/convert_openai_to_hf.py --checkpoint_path "" --pytorch_dump_folder_path "Arthur/whisper-3" --convert_preprocessor True
```
Here the `whisper_version` will set the number of languages to `100` to account for `cantonese` which was added in `whisper-large-v3`.
The script will automatically determine all necessary parameters from the OpenAI checkpoint. A `tiktoken` library needs to be installed
to perform the conversion of the OpenAI tokenizer to the `tokenizers` version.
## Inference