[Speech2Text2] Enable tokenizers (#14390)

* [Speech2Text2] Enable tokenizers * minor fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-11-15 16:34:11 +01:00
parent 267867e851
commit 4ce74edf51
3 changed files with 171 additions and 100 deletions
--- a/docs/source/model_doc/speech_to_text_2.rst
+++ b/docs/source/model_doc/speech_to_text_2.rst
@@ -36,7 +36,7 @@ Tips:
 - Speech2Text2 achieves state-of-the-art results on the CoVoST Speech Translation dataset. For more information, see
  the `official models <https://huggingface.co/models?other=speech2text2>`__ .
 - Speech2Text2 is always used within the :doc:`SpeechEncoderDecoder <speechencoderdecoder>` framework.
- Speech2Text2's tokenizer currently only supports inference, but not training.
+- Speech2Text2's tokenizer is based on `fastBPE <https://github.com/glample/fastBPE>`.

 Inference
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~