[XLNet] Fix mems behavior (#8567)
* fix mems in xlnet * fix use_mems * fix use_mem_len * fix use mems * clean docs * fix tf typo * make xlnet tf for generation work * fix tf test * refactor use cache * add use cache for missing models * correct use_cache in generate * correct use cache in tf generate * fix tf * correct getattr typo * make sylvain happy * change in docs as well * do not apply to cookie cutter statements * fix tf test * make pytorch model fully backward compatible
This commit is contained in:
committed by
GitHub
parent
369f1d77b4
commit
2a6fbe6a40
@@ -13,7 +13,7 @@ The MBart model was presented in `Multilingual Denoising Pre-training for Neural
|
||||
Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||
|
||||
According to the abstract, MBART is a sequence-to-sequence denoising auto-encoder pretrained on large-scale monolingual
|
||||
corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete
|
||||
corpora in many languages using the BART objective. mBART is one of the first methods for pretraining a complete
|
||||
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
|
||||
on the encoder, decoder, or reconstructing parts of the text.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user