From 604a21b1e68267df29e4910f425c92c336973f5d Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Thu, 15 Jun 2023 14:29:32 +0200 Subject: [PATCH] [Docs] Improve docs for MMS loading of other languages (#24292) * Improve docs * Apply suggestions from code review * upload readme * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --------- Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> --- docs/source/en/model_doc/mms.mdx | 46 +++++++++++++++++++++++++++++--- 1 file changed, 43 insertions(+), 3 deletions(-) diff --git a/docs/source/en/model_doc/mms.mdx b/docs/source/en/model_doc/mms.mdx index bd32617370..32cffbdfcb 100644 --- a/docs/source/en/model_doc/mms.mdx +++ b/docs/source/en/model_doc/mms.mdx @@ -44,11 +44,51 @@ MMS's architecture is based on the Wav2Vec2 model, so one can refer to [Wav2Vec2 The original code can be found [here](https://github.com/facebookresearch/fairseq/tree/main/examples/mms). +## Loading + +By default MMS loads adapter weights for English. If you want to load adapter weights of another language +make sure to specify `target_lang=` as well as `"ignore_mismatched_sizes=True`. +The `ignore_mismatched_sizes=True` keyword has to be passed to allow the language model head to be resized according +to the vocabulary of the specified language. +Similarly, the processor should be loaded with the same target language + +```py +from transformers import Wav2Vec2ForCTC, AutoProcessor + +model_id = "facebook/mms-1b-all" +target_lang = "fra" + +processor = AutoProcessor.from_pretrained(model_id, target_lang=target_lang) +model = Wav2Vec2ForCTC.from_pretrained(model_id, target_lang=target_lang, ignore_mismatched_sizes=True) +``` + + + +You can safely ignore a warning such as: + +```text +Some weights of Wav2Vec2ForCTC were not initialized from the model checkpoint at facebook/mms-1b-all and are newly initialized because the shapes did not match: +- lm_head.bias: found shape torch.Size([154]) in the checkpoint and torch.Size([314]) in the model instantiated +- lm_head.weight: found shape torch.Size([154, 1280]) in the checkpoint and torch.Size([314, 1280]) in the model instantiated +You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. +``` + + + +If you want to use the ASR pipeline, you can load your chosen target language as such: + +```py +from transformers import pipeline + +model_id = "facebook/mms-1b-all" +target_lang = "fra" + +pipe = pipeline(model=model_id, model_kwargs={"target_lang": "fra", "ignore_mismatched_sizes": True}) +``` + ## Inference -By default MMS loads adapter weights for English, but those can be easily switched out for another language. -Let's look at an example. - +Next, let's look at how we can run MMS in inference and change adapter layers after having called [`~PretrainedModel.from_pretrained`] First, we load audio data in different languages using the [Datasets](https://github.com/huggingface/datasets). ```py