[ported model] FSMT (FairSeq MachineTranslation) (#6940)

* ready for PR * cleanup * correct FSMT_PRETRAINED_MODEL_ARCHIVE_LIST * fix * perfectionism * revert change from another PR * odd, already committed this one * non-interactive upload workaround * backup the failed experiment * store langs in config * workaround for localizing model path * doc clean up as in https://github.com/huggingface/transformers/pull/6956 * style * back out debug mode * document: run_eval.py --num_beams 10 * remove unneeded constant * typo * re-use bart's Attention * re-use EncoderLayer, DecoderLayer from bart * refactor * send to cuda and fp16 * cleanup * revert (moved to another PR) * better error message * document run_eval --num_beams * solve the problem of tokenizer finding the right files when model is local * polish, remove hardcoded config * add a note that the file is autogenerated to avoid losing changes * prep for org change, remove unneeded code * switch to model4.pt, update scores * s/python/bash/ * missing init (but doesn't impact the finetuned model) * cleanup * major refactor (reuse-bart) * new model, new expected weights * cleanup * cleanup * full link * fix model type * merge porting notes * style * cleanup * have to create a DecoderConfig object to handle vocab_size properly * doc fix * add note (not a public class) * parametrize * - add bleu scores integration tests * skip test if sacrebleu is not installed * cache heavy models/tokenizers * some tweaks * remove tokens that aren't used * more purging * simplify code * switch to using decoder_start_token_id * add doc * Revert "major refactor (reuse-bart)" This reverts commit 226dad15ca6a9ef4e26178526e878e8fc5c85874. * decouple from bart * remove unused code #1 * remove unused code #2 * remove unused code #3 * update instructions * clean up * move bleu eval to examples * check import only once * move data+gen script into files * reuse via import * take less space * add prepare_seq2seq_batch (auto-tested) * cleanup * recode test to use json instead of yaml * ignore keys not needed * use the new -y in transformers-cli upload -y * [xlm tok] config dict: fix str into int to match definition (#7034) * [s2s] --eval_max_generate_length (#7018) * Fix CI with change of name of nlp (#7054) * nlp -> datasets * More nlp -> datasets * Woopsie * More nlp -> datasets * One last * extending to support allen_nlp wmt models - allow a specific checkpoint file to be passed - more arg settings - scripts for allen_nlp models * sync with changes * s/fsmt-wmt/wmt/ in model names * s/fsmt-wmt/wmt/ in model names (p2) * s/fsmt-wmt/wmt/ in model names (p3) * switch to a better checkpoint * typo * make non-optional args such - adjust tests where possible or skip when there is no other choice * consistency * style * adjust header * cards moved (model rename) * use best custom hparams * update info * remove old cards * cleanup * s/stas/facebook/ * update scores * s/allen_nlp/allenai/ * url maps aren't needed * typo * move all the doc / build /eval generators to their own scripts * cleanup * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * fix indent * duplicated line * style * use the correct add_start_docstrings * oops * resizing can't be done with the core approach, due to 2 dicts * check that the arg is a list * style * style Co-authored-by: Sam Shleifer <sshleifer@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-17 08:31:29 -07:00
parent 492bb6aa48
commit 1eeb206bef
19 changed files with 3534 additions and 4 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -3,9 +3,9 @@ Transformers

 State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

-🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose 
-architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural 
-Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between 
+🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose
+architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural
+Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between
 TensorFlow 2.0 and PyTorch.

 This is the documentation of our repository `transformers <https://github.com/huggingface/transformers>`_.
@@ -127,7 +127,7 @@ conversion utilities for the following models:
 23. `Pegasus <https://github.com/google-research/pegasus>`_ (from Google) released with the paper `PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization
    <https://arxiv.org/abs/1912.08777>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
 24. `MBart <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`_ (from Facebook) released with the paper  `Multilingual Denoising Pre-training for Neural Machine Translation <https://arxiv.org/abs/2001.08210>`_ by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov,
-    Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.  
+    Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
 25. `LXMERT <https://github.com/airsplay/lxmert>`_ (from UNC Chapel Hill) released with the paper `LXMERT: Learning
    Cross-Modality Encoder Representations from Transformers for Open-Domain Question
    Answering <https://arxiv.org/abs/1908.07490>`_ by Hao Tan and Mohit Bansal.
@@ -223,6 +223,7 @@ conversion utilities for the following models:
    model_doc/dpr
    model_doc/pegasus
    model_doc/mbart
+    model_doc/fsmt
    model_doc/funnel
    model_doc/lxmert
    model_doc/bertgeneration
--- a/docs/source/model_doc/fsmt.rst
+++ b/docs/source/model_doc/fsmt.rst
@@ -0,0 +1,49 @@
+FSMT
+----------------------------------------------------
+**DISCLAIMER:** If you see something strange,
+file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`__ and assign
+@stas00.
+
+Overview
+~~~~~~~~~~~~~~~~~~~~~
+
+FSMT (FairSeq MachineTranslation) models were introduced in "Facebook FAIR's WMT19 News Translation Task Submission" <this paper <https://arxiv.org/abs/1907.06616>__ by Nathan Ng, Kyra Yee, Alexei Baevski, Myle Ott, Michael Auli, Sergey Edunov.
+
+The abstract of the paper is the following:
+
+    This paper describes Facebook FAIR's submission to the WMT19 shared news translation task. We participate in two language pairs and four language directions, English <-> German and English <-> Russian. Following our submission from last year, our baseline systems are large BPE-based transformer models trained with the Fairseq sequence modeling toolkit which rely on sampled back-translations. This year we experiment with different bitext data filtering schemes, as well as with adding filtered back-translated data. We also ensemble and fine-tune our models on domain-specific data, then decode using noisy channel model reranking. Our submissions are ranked first in all four directions of the human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations. This system improves upon our WMT'18 submission by 4.5 BLEU points.
+
+The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
+
+Implementation Notes
+~~~~~~~~~~~~~~~~~~~~
+
+- FSMT uses source and target vocab pair, that aren't combined into one. It doesn't share embed tokens either. Its tokenizer is very similar to `XLMTokenizer` and the main model is derived from `BartModel`.
+
+
+FSMTForConditionalGeneration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.FSMTForConditionalGeneration
+    :members: forward
+
+
+FSMTConfig
+~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.FSMTConfig
+    :members:
+
+
+FSMTTokenizer
+~~~~~~~~~~~~~~~~~~~~~
+
+.. autoclass:: transformers.FSMTTokenizer
+    :members:
+
+
+FSMTModel
+~~~~~~~~~~~~~
+
+.. autoclass:: transformers.FSMTModel
+    :members: forward