[Whisper] Add large-v3 version support (#27336)

* Enable large-v3 downloading and update language list * Fix type annotation * make fixup * Export Whisper feature extractor * Fix error after extractor loading * Do not use pre-computed mel filters * Save the full preprocessor properly * Update docs * Remove comment Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Add alignment heads consistent with each Whisper version * Remove alignment heads calculation * Save fast tokenizer format as well * Fix slow to fast conversion * Fix bos/eos/pad token IDs in the model config * Add decoder_start_token_id to config --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-21 00:36:48 +08:00
parent 93f2de858b
commit 87e217d065
3 changed files with 100 additions and 29 deletions
--- a/docs/source/en/model_doc/whisper.md
+++ b/docs/source/en/model_doc/whisper.md
@@ -34,13 +34,13 @@ The original code can be found [here](https://github.com/openai/whisper).
 - Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release.
 - One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text.

- To convert the tokenizer, we recommend using the following:
+- To convert the model and the processor, we recommend using the following:

 ```bash
-python src/transformers/models/whisper/convert_openai_to_hf.py --checkpoint_path "" --pytorch_dump_folder_path "Arthur/whisper-3" --convert_tokenizer True --whisper_version 3 --multilingual True
+python src/transformers/models/whisper/convert_openai_to_hf.py --checkpoint_path "" --pytorch_dump_folder_path "Arthur/whisper-3" --convert_preprocessor True
 ```
-Here the `whisper_version` will set the number of languages to `100` to account for `cantonese` which was added in `whisper-large-v3`.
-
+The script will automatically determine all necessary parameters from the OpenAI checkpoint. A `tiktoken` library needs to be installed
+to perform the conversion of the OpenAI tokenizer to the `tokenizers` version.

 ## Inference