Wav2Vec2 meets phonemes (#14353)

* up * add tokenizer * improve more * finish tokenizer * finish * adapt speech recognition script * adapt convert * more fixes * more fixes * update phonemizer wav2vec2 * better naming * fix more tests * more fixes swedish * correct tests * finish * improve script * remove file * up * lets get those 100 model architectures until the end of the month * make fix-copies * correct more * correct script * more fixes * more fixes * add to docs * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * replace assert * fix copies * fix docs * new try docs * boom boom * update * add phonemizer to audio tests * make fix-copies * up * upload models * some changes * Update tests/test_tokenization_wav2vec2_phoneme.py Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * more fixes * remove @ Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2021-12-17 19:56:44 +01:00
parent 77d6c826d8
commit c4a96cecbc
26 changed files with 1296 additions and 151 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -284,6 +284,8 @@
      title: VisualBERT
    - local: model_doc/wav2vec2
      title: Wav2Vec2
+    - local: model_doc/wav2vec2_phoneme
+      title: Wav2Vec2Phoneme
    - local: model_doc/wavlm
      title: WavLM
    - local: model_doc/xlm
@@ -296,6 +298,8 @@
      title: XLNet
    - local: model_doc/xlsr_wav2vec2
      title: XLSR-Wav2Vec2
+    - local: model_doc/xls_r
+      title: XLS-R
    title: Models
  - sections:
    - local: internal/modeling_utils