Add WhisperTokenizerFast (#21222)
* Add WhisperTokenizerFast * Fixup * Up * Up * Improve tests * Update src/transformers/models/whisper/tokenization_whisper_fast.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Keep stride in whisper pipelien test * Remove unknown token special case * Reduce vocabulary size in tests * Fix vocab size assertion * Sync copied changes from WhisperTokenizer * Skip pipeline tests * Update assertion * Remove Whisper tokenizer dependency on sentencepiece * Format --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
@@ -45,6 +45,15 @@ The original code can be found [here](https://github.com/openai/whisper).
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## WhisperTokenizerFast
|
||||
|
||||
[[autodoc]] WhisperTokenizerFast
|
||||
- set_prefix_tokens
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## WhisperFeatureExtractor
|
||||
|
||||
[[autodoc]] WhisperFeatureExtractor
|
||||
|
||||
Reference in New Issue
Block a user