[XLNet] Fix mems behavior (#8567)

* fix mems in xlnet * fix use_mems * fix use_mem_len * fix use mems * clean docs * fix tf typo * make xlnet tf for generation work * fix tf test * refactor use cache * add use cache for missing models * correct use_cache in generate * correct use cache in tf generate * fix tf * correct getattr typo * make sylvain happy * change in docs as well * do not apply to cookie cutter statements * fix tf test * make pytorch model fully backward compatible
2020-11-25 22:54:59 +01:00
parent 369f1d77b4
commit 2a6fbe6a40
47 changed files with 259 additions and 134 deletions
--- a/docs/source/model_doc/mbart.rst
+++ b/docs/source/model_doc/mbart.rst
@@ -13,7 +13,7 @@ The MBart model was presented in `Multilingual Denoising Pre-training for Neural
 Ghazvininejad, Mike Lewis, Luke Zettlemoyer.

 According to the abstract, MBART is a sequence-to-sequence denoising auto-encoder pretrained on large-scale monolingual
-corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete
+corpora in many languages using the BART objective. mBART is one of the first methods for pretraining a complete
 sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
 on the encoder, decoder, or reconstructing parts of the text.