[MMS] Update docs with HF TTS implementation (#25907)

* [MMS] Update docs with HF TTS implementation

* Update docs/source/en/model_doc/mms.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add uromanise to docs

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
This commit is contained in:
Sanchit Gandhi
2023-09-01 16:50:59 +01:00
committed by GitHub
parent b439129e74
commit 1fa2d89a9b
2 changed files with 189 additions and 1 deletions

View File

@@ -98,6 +98,54 @@ print(tokenizer.is_uroman)
If required, you should apply the uroman package to your text inputs **prior** to passing them to the `VitsTokenizer`,
since currently the tokenizer does not support performing the pre-processing itself.
To do this, first clone the uroman repository to your local machine and set the bash variable `UROMAN` to the local path:
```bash
git clone https://github.com/isi-nlp/uroman.git
cd uroman
export UROMAN=$(pwd)
```
You can then pre-process the text input using the following code snippet. You can either rely on using the bash variable
`UROMAN` to point to the uroman repository, or you can pass the uroman directory as an argument to the `uromaize` function:
```python
import torch
from transformers import VitsTokenizer, VitsModel, set_seed
import os
import subprocess
tokenizer = VitsTokenizer.from_pretrained("facebook/mms-tts-kor")
model = VitsModel.from_pretrained("facebook/mms-tts-kor")
def uromanize(input_string, uroman_path):
"""Convert non-Roman strings to Roman using the `uroman` perl package."""
script_path = os.path.join(uroman_path, "bin", "uroman.pl")
command = ["perl", script_path]
process = subprocess.Popen(command, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# Execute the perl command
stdout, stderr = process.communicate(input=input_string.encode())
if process.returncode != 0:
raise ValueError(f"Error {process.returncode}: {stderr.decode()}")
# Return the output as a string and skip the new-line character at the end
return stdout.decode()[:-1]
text = "이봐 무슨 일이야"
uromaized_text = uromanize(text, uroman_path=os.environ["UROMAN"])
inputs = tokenizer(text=uromaized_text, return_tensors="pt")
set_seed(555) # make deterministic
with torch.no_grad():
outputs = model(inputs["input_ids"])
waveform = outputs.waveform[0]
```
## VitsConfig
[[autodoc]] VitsConfig