[Wav2Vec2] PyCTCDecode Integration to support language model boosted decoding (#14339)

* up * up * up * make it cleaner * correct * make styhahalal * add more tests * finish * small fix * make style * up * tryout to solve cicrle ci * up * fix more tests * fix more tests * apply sylvains suggestions * fix import * correct docs * add pyctcdecode only to speech tests * fix more tests * add tf, flax and pt tests * add pt * fix last tests * fix more tests * Apply suggestions from code review * change lines * Apply suggestions from code review Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com> * correct tests * correct tests * add doc string Co-authored-by: Anton Lozhkov <aglozhkov@gmail.com>
2021-12-08 12:07:54 +01:00
parent 2e12d90b9e
commit 961732c276
16 changed files with 831 additions and 19 deletions
--- a/setup.py
+++ b/setup.py
@@ -51,7 +51,7 @@ To create the package for pypi.
   pip install -i https://testpypi.python.org/pypi transformers

   Check you can run the following commands:
-   python -c "from transformers import pipeline; classifier = pipeline('text-classification'); print(classifier('What a nice release'))" 
+   python -c "from transformers import pipeline; classifier = pipeline('text-classification'); print(classifier('What a nice release'))"
   python -c "from transformers import *"

 9. Upload the final version to actual pypi:
@@ -59,7 +59,7 @@ To create the package for pypi.

 10. Copy the release notes from RELEASE.md to the tag in github once everything is looking hunky-dory.

-11. Run `make post-release` (or, for a patch release, `make post-patch`). If you were on a branch for the release, 
+11. Run `make post-release` (or, for a patch release, `make post-patch`). If you were on a branch for the release,
    you need to go back to master before executing this.
 """

@@ -159,6 +159,7 @@ _deps = [
    "tokenizers>=0.10.1,<0.11",
    "torch>=1.0",
    "torchaudio",
+    "pyctcdecode>=0.2.0",
    "tqdm>=4.27",
    "unidic>=1.0.2",
    "unidic_lite>=1.0.7",
@@ -262,7 +263,7 @@ extras["sigopt"] = deps_list("sigopt")
 extras["integrations"] = extras["optuna"] + extras["ray"] + extras["sigopt"]

 extras["serving"] = deps_list("pydantic", "uvicorn", "fastapi", "starlette")
-extras["audio"] = deps_list("librosa")
+extras["audio"] = deps_list("librosa", "pyctcdecode")
 extras["speech"] = deps_list("torchaudio") + extras["audio"]  # `pip install ".[speech]"` is deprecated and `pip install ".[torch-speech]"` should be used instead
 extras["torch-speech"] = deps_list("torchaudio") + extras["audio"]
 extras["tf-speech"] = extras["audio"]