Add WhisperTokenizerFast (#21222)
* Add WhisperTokenizerFast * Fixup * Up * Up * Improve tests * Update src/transformers/models/whisper/tokenization_whisper_fast.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Keep stride in whisper pipelien test * Remove unknown token special case * Reduce vocabulary size in tests * Fix vocab size assertion * Sync copied changes from WhisperTokenizer * Skip pipeline tests * Update assertion * Remove Whisper tokenizer dependency on sentencepiece * Format --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
@@ -406,7 +406,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| Wav2Vec2 | ✅ | ❌ | ✅ | ✅ | ✅ |
|
||||
| Wav2Vec2-Conformer | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| WavLM | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| Whisper | ✅ | ❌ | ✅ | ✅ | ✅ |
|
||||
| Whisper | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
| X-CLIP | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| X-MOD | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| XGLM | ✅ | ✅ | ✅ | ✅ | ✅ |
|
||||
|
||||
@@ -45,6 +45,15 @@ The original code can be found [here](https://github.com/openai/whisper).
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## WhisperTokenizerFast
|
||||
|
||||
[[autodoc]] WhisperTokenizerFast
|
||||
- set_prefix_tokens
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## WhisperFeatureExtractor
|
||||
|
||||
[[autodoc]] WhisperFeatureExtractor
|
||||
|
||||
Reference in New Issue
Block a user