From 04976a32dc555667afa994e8f918cbee88d84a4f Mon Sep 17 00:00:00 2001 From: Ayaka Mikazuki Date: Mon, 20 Sep 2021 19:53:31 +0800 Subject: [PATCH] Fix mT5 documentation (#13639) * Fix MT5 documentation The abstract is incomplete * MT5 -> mT5 --- docs/source/model_doc/mt5.rst | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/source/model_doc/mt5.rst b/docs/source/model_doc/mt5.rst index 6d752502d3..64713086fd 100644 --- a/docs/source/model_doc/mt5.rst +++ b/docs/source/model_doc/mt5.rst @@ -10,7 +10,7 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -MT5 +mT5 ----------------------------------------------------------------------------------------------------------------------- Overview @@ -24,9 +24,11 @@ The abstract from the paper is the following: *The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a -multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We describe +multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual -benchmarks. All of the code and model checkpoints* +benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a +generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model +checkpoints used in this work are publicly available.* Note: mT5 was only pre-trained on `mC4 `__ excluding any supervised training. Therefore, this model has to be fine-tuned before it is useable on a downstream task, unlike the original T5 model.