Wav2Vec2 meets phonemes (#14353)

* up

* add tokenizer

* improve more

* finish tokenizer

* finish

* adapt speech recognition script

* adapt convert

* more fixes

* more fixes

* update phonemizer wav2vec2

* better naming

* fix more tests

* more fixes swedish

* correct tests

* finish

* improve script

* remove file

* up

* lets get those 100 model architectures until the end of the month

* make fix-copies

* correct more

* correct script

* more fixes

* more fixes

* add to docs

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* replace assert

* fix copies

* fix docs

* new try docs

* boom boom

* update

* add phonemizer to audio tests

* make fix-copies

* up

* upload models

* some changes

* Update tests/test_tokenization_wav2vec2_phoneme.py

Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>

* more fixes

* remove @

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
This commit is contained in:
Patrick von Platen
2021-12-17 19:56:44 +01:00
committed by GitHub
parent 77d6c826d8
commit c4a96cecbc
26 changed files with 1296 additions and 151 deletions

View File

@@ -38,6 +38,7 @@ from .file_utils import (
is_librosa_available,
is_onnx_available,
is_pandas_available,
is_phonemizer_available,
is_pyctcdecode_available,
is_pytesseract_available,
is_pytorch_quantization_available,
@@ -590,6 +591,16 @@ def require_deepspeed(test_case):
return test_case
def require_phonemizer(test_case):
"""
Decorator marking a test that requires phonemizer
"""
if not is_phonemizer_available():
return unittest.skip("test requires phonemizer")(test_case)
else:
return test_case
def require_pyctcdecode(test_case):
"""
Decorator marking a test that requires pyctcdecode