Wav2Vec2 meets phonemes (#14353)

* up * add tokenizer * improve more * finish tokenizer * finish * adapt speech recognition script * adapt convert * more fixes * more fixes * update phonemizer wav2vec2 * better naming * fix more tests * more fixes swedish * correct tests * finish * improve script * remove file * up * lets get those 100 model architectures until the end of the month * make fix-copies * correct more * correct script * more fixes * more fixes * add to docs * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * replace assert * fix copies * fix docs * new try docs * boom boom * update * add phonemizer to audio tests * make fix-copies * up * upload models * some changes * Update tests/test_tokenization_wav2vec2_phoneme.py Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * more fixes * remove @ Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2021-12-17 19:56:44 +01:00
parent 77d6c826d8
commit c4a96cecbc
26 changed files with 1296 additions and 151 deletions
--- a/src/transformers/testing_utils.py
+++ b/src/transformers/testing_utils.py
@@ -38,6 +38,7 @@ from .file_utils import (
    is_librosa_available,
    is_onnx_available,
    is_pandas_available,
+    is_phonemizer_available,
    is_pyctcdecode_available,
    is_pytesseract_available,
    is_pytorch_quantization_available,
@@ -590,6 +591,16 @@ def require_deepspeed(test_case):
        return test_case


+def require_phonemizer(test_case):
+    """
+    Decorator marking a test that requires phonemizer
+    """
+    if not is_phonemizer_available():
+        return unittest.skip("test requires phonemizer")(test_case)
+    else:
+        return test_case
+
+
 def require_pyctcdecode(test_case):
    """
    Decorator marking a test that requires pyctcdecode