[Whisper Tokenizer] Make more user-friendly (#19921)

* [Whisper Tokenizer] Make more user-friendly

* use property

* make indexing rigorous

* small clean-up

* tests

* skip seq2seq tests

* remove multilingual arg

* reorder args

* collapse to one function

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* option to override attributes

Co-authored-by: ArthurZucker <arthur@huggingface.co>

* add to docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* make comment more clear

Co-authored-by: sgugger <sylvain@huggingface.co>

* don't add special tokens in get_decoder_prompt_ids

* add test for set_prefix_tokens

Co-authored-by: ArthurZucker <arthur@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: sgugger <sylvain@huggingface.co>
This commit is contained in:
Sanchit Gandhi
2022-11-03 14:22:40 +00:00
committed by GitHub
parent 790ff2544a
commit 06d488061f
4 changed files with 277 additions and 59 deletions

View File

@@ -39,6 +39,7 @@ The original code can be found [here](https://github.com/openai/whisper).
## WhisperTokenizer
[[autodoc]] WhisperTokenizer
- set_prefix_tokens
- build_inputs_with_special_tokens
- get_special_tokens_mask
- create_token_type_ids_from_sequences