From c9564f53433fb637cbd8eec0d902e80e30f91814 Mon Sep 17 00:00:00 2001
From: Suraj Patil <surajp815@gmail.com>
Date: Mon, 17 Aug 2020 22:00:26 +0530
Subject: [PATCH] [Doc] add more MBart and other doc (#6490)

* add mbart example

* add Pegasus and MBart in readme

* typo

* add MBart in Pretrained models

* add pre-proc doc

* add DPR in readme

* fix indent

* doc fix
---
 README.md                          |  9 +++++--
 docs/source/index.rst              |  2 +-
 docs/source/model_doc/mbart.rst    | 39 ++++++++++++++++++++++++++++++
 docs/source/pretrained_models.rst  |  9 ++++---
 src/transformers/modeling_mbart.py | 12 ++++++++-
 5 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 38a3af573a..b1bd8691f8 100644
--- a/README.md
+++ b/README.md
@@ -167,8 +167,13 @@ At some point in the future, you'll be able to seamlessly move from pre-training
 19. **[Reformer](https://huggingface.co/transformers/model_doc/reformer.html)** (from Google Research) released with the paper [Reformer: The Efficient Transformer](https://arxiv.org/abs/2001.04451) by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
 20. **[MarianMT](https://huggingface.co/transformers/model_doc/marian.html)** Machine translation models trained using [OPUS](http://opus.nlpl.eu/) data by Jörg Tiedemann. The [Marian Framework](https://marian-nmt.github.io/) is being developed by the Microsoft Translator Team.
 21. **[Longformer](https://huggingface.co/transformers/model_doc/longformer.html)** (from AllenAI) released with the paper [Longformer: The Long-Document Transformer](https://arxiv.org/abs/2004.05150) by Iz Beltagy, Matthew E. Peters, Arman Cohan.
-22. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
-23. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
+22. **[DPR](https://github.com/facebookresearch/DPR)** (from Facebook) released with the paper [Dense Passage Retrieval
+for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906) by Vladimir Karpukhin, Barlas Oğuz, Sewon
+Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
+23. **[Pegasus](https://github.com/google-research/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
+24. **[MBart](https://github.com/pytorch/fairseq/tree/master/examples/mbart)** (from Facebook) released with the paper  [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.  
+25. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
+26. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
 
 These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
 
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 8113903d1e..9d0ea1fc5b 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -126,7 +126,7 @@ conversion utilities for the following models:
     Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
 23. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
     <https://arxiv.org/abs/1912.08777>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
-24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper  `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov
+24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper  `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov,
     Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.  
 25. `Other community models <https://huggingface.co/models>`_, contributed by the `community
     <https://huggingface.co/users>`_.
diff --git a/docs/source/model_doc/mbart.rst b/docs/source/model_doc/mbart.rst
index 7305fce941..1cfc65e663 100644
--- a/docs/source/model_doc/mbart.rst
+++ b/docs/source/model_doc/mbart.rst
@@ -14,6 +14,45 @@ MBART is a sequence-to-sequence denoising auto-encoder pre-trained on large-scal
 The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
 
 
+Training
+~~~~~~~~~~~~~~~~~~~~~
+MBart is a multilingual encoder-decoder (seq-to-seq) model primarily intended for translation task. 
+As the model is multilingual it expects the sequences in a different format. A special language id token 
+is added in both the source and target text. The source text format is ``X [eos, src_lang_code]`` 
+where ``X`` is the source text. The target text format is ```[tgt_lang_code] X [eos]```. ```bos``` is never used.
+The ```MBartTokenizer.prepare_seq2seq_batch``` handles this automatically and should be used to encode 
+the sequences for seq-2-seq fine-tuning.
+
+- Supervised training
+
+::
+
+    example_english_phrase = "UN Chief Says There Is No Military Solution in Syria"
+    expected_translation_romanian = "Şeful ONU declară că nu există o soluţie militară în Siria"
+    batch = tokenizer.prepare_seq2seq_batch(example_english_phrase, src_lang="en_XX", tgt_lang="ro_RO", tgt_texts=expected_translation_romanian)
+    input_ids = batch["input_ids"]
+    target_ids = batch["decoder_input_ids"]
+    decoder_input_ids = target_ids[:, :-1].contiguous()
+    labels = target_ids[:, 1:].clone()
+    model(input_ids=input_ids, decoder_input_ids=decoder_input_ids, labels=labels) #forward
+
+- Generation
+
+    While generating the target text set the `decoder_start_token_id` to the target language id. 
+    The following example shows how to translate English to Romanian using the ```facebook/mbart-large-en-ro``` model.
+
+::
+
+    from transformers import MBartForConditionalGeneration, MBartTokenizer
+    model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
+    tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
+    article = "UN Chief Says There Is No Military Solution in Syria"
+    batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], src_lang="en_XX")
+    translated_tokens = model.generate(**batch, decoder_start_token_id=tokenizer.lang_code_to_id["ro_RO"])
+    translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
+    assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
+
+
 MBartConfig
 ~~~~~~~~~~~~~~~~~~~~~
 
diff --git a/docs/source/pretrained_models.rst b/docs/source/pretrained_models.rst
index 794c93f8d2..44a6b721fa 100644
--- a/docs/source/pretrained_models.rst
+++ b/docs/source/pretrained_models.rst
@@ -331,9 +331,6 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
 |                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 |                   | ``facebook/bart-large-cnn``                                | | 12-layer, 1024-hidden, 16-heads, 406M parameters       (same as base)                                                               |
 |                   |                                                            | | bart-large base architecture finetuned on cnn summarization task                                                                    |
-|                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
-|                   | ``facebook/mbart-large-en-ro``                             | | 12-layer, 1024-hidden, 16-heads, 880M parameters                                                                                    |
-|                   |                                                            | | bart-large architecture pretrained on cc25 multilingual data , finetuned on WMT english romanian translation.                       |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
 | DialoGPT          | ``DialoGPT-small``                                         | | 12-layer, 768-hidden, 12-heads, 124M parameters                                                                                     |
 |                   |                                                            | | Trained on English text: 147M conversation-like exchanges extracted from Reddit.                                                    |
@@ -361,3 +358,9 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
 |                   | ``allenai/longformer-large-4096``                          | | 24-layer, 1024-hidden, 16-heads, ~435M parameters                                                                                   |
 |                   |                                                            | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096                                                    |
 +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+| MBart             | ``facebook/mbart-large-cc25``                              | | 24-layer, 1024-hidden, 16-heads, 610M parameters                                                                                    |
+|                   |                                                            | | mBART (bart-large architecture) model trained on 25 languages' monolingual corpus                                                   |
+|                   +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
+|                   | ``facebook/mbart-large-en-ro``                             | | 24-layer, 1024-hidden, 16-heads, 610M parameters                                                                                    |
+|                   |                                                            | | mbart-large-cc25 model finetuned on WMT english romanian translation.                                                               |
++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
\ No newline at end of file
diff --git a/src/transformers/modeling_mbart.py b/src/transformers/modeling_mbart.py
index 60e47fe315..fe198c6430 100644
--- a/src/transformers/modeling_mbart.py
+++ b/src/transformers/modeling_mbart.py
@@ -30,9 +30,19 @@ MBART_START_DOCSTRING = r"""
     "The BART Model with a language modeling head. Can be used for machine translation.", MBART_START_DOCSTRING
 )
 class MBartForConditionalGeneration(BartForConditionalGeneration):
-    """
+    r"""
     This class overrides :class:`~transformers.BartForConditionalGeneration`. Please check the
     superclass for the appropriate documentation alongside usage examples.
+
+    Examples::
+        >>> from transformers import MBartForConditionalGeneration, MBartTokenizer
+        >>> model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-en-ro")
+        >>> tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-en-ro")
+        >>> article = "UN Chief Says There Is No Military Solution in Syria"
+        >>> batch = tokenizer.prepare_seq2seq_batch(src_texts=[article])
+        >>> translated_tokens = model.generate(**batch)
+        >>> translation = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]
+        >>> assert translation == "Şeful ONU declară că nu există o soluţie militară în Siria"
     """
 
     config_class = MBartConfig