[Docs] Improve docs for MMS loading of other languages (#24292)
* Improve docs * Apply suggestions from code review * upload readme * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
e6122c3f40
commit
604a21b1e6
@@ -44,11 +44,51 @@ MMS's architecture is based on the Wav2Vec2 model, so one can refer to [Wav2Vec2
|
|||||||
|
|
||||||
The original code can be found [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms).
|
The original code can be found [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms).
|
||||||
|
|
||||||
|
## Loading
|
||||||
|
|
||||||
|
By default MMS loads adapter weights for English. If you want to load adapter weights of another language
|
||||||
|
make sure to specify `target_lang=<your-chosen-target-lang>` as well as `"ignore_mismatched_sizes=True`.
|
||||||
|
The `ignore_mismatched_sizes=True` keyword has to be passed to allow the language model head to be resized according
|
||||||
|
to the vocabulary of the specified language.
|
||||||
|
Similarly, the processor should be loaded with the same target language
|
||||||
|
|
||||||
|
```py
|
||||||
|
from transformers import Wav2Vec2ForCTC, AutoProcessor
|
||||||
|
|
||||||
|
model_id = "facebook/mms-1b-all"
|
||||||
|
target_lang = "fra"
|
||||||
|
|
||||||
|
processor = AutoProcessor.from_pretrained(model_id, target_lang=target_lang)
|
||||||
|
model = Wav2Vec2ForCTC.from_pretrained(model_id, target_lang=target_lang, ignore_mismatched_sizes=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
You can safely ignore a warning such as:
|
||||||
|
|
||||||
|
```text
|
||||||
|
Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/mms-1b-all and are newly initialized because the shapes did not match:
|
||||||
|
- lm_head.bias: found shape torch.Size([154]) in the checkpoint and torch.Size([314]) in the model instantiated
|
||||||
|
- lm_head.weight: found shape torch.Size([154, 1280]) in the checkpoint and torch.Size([314, 1280]) in the model instantiated
|
||||||
|
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
|
||||||
|
```
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
If you want to use the ASR pipeline, you can load your chosen target language as such:
|
||||||
|
|
||||||
|
```py
|
||||||
|
from transformers import pipeline
|
||||||
|
|
||||||
|
model_id = "facebook/mms-1b-all"
|
||||||
|
target_lang = "fra"
|
||||||
|
|
||||||
|
pipe = pipeline(model=model_id, model_kwargs={"target_lang": "fra", "ignore_mismatched_sizes": True})
|
||||||
|
```
|
||||||
|
|
||||||
## Inference
|
## Inference
|
||||||
|
|
||||||
By default MMS loads adapter weights for English, but those can be easily switched out for another language.
|
Next, let's look at how we can run MMS in inference and change adapter layers after having called [`~PretrainedModel.from_pretrained`]
|
||||||
Let's look at an example.
|
|
||||||
|
|
||||||
First, we load audio data in different languages using the [Datasets](https://github.com/huggingface/datasets).
|
First, we load audio data in different languages using the [Datasets](https://github.com/huggingface/datasets).
|
||||||
|
|
||||||
```py
|
```py
|
||||||
|
|||||||
Reference in New Issue
Block a user