From ffd675b42c59d7fa202e85e12defe9fc43913365 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 20 Oct 2020 16:11:02 +0200 Subject: [PATCH] add summary (#7927) --- docs/source/model_summary.rst | 37 +++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/docs/source/model_summary.rst b/docs/source/model_summary.rst index acfaf243e9..40524c97b5 100644 --- a/docs/source/model_summary.rst +++ b/docs/source/model_summary.rst @@ -612,6 +612,43 @@ The `mbart-large-cc25 `_ check .. _multimodal-models: +ProphetNet +----------------------------------------------------------------------------------------------------------------------- + +.. raw:: html + + + Models + + + Doc + + +`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou. + +ProphetNet introduces a novel *sequence-to-sequence* pre-training objective, called *future n-gram prediction*. In future n-gram prediction, the model predicts the next n tokens simultaneously based on previous context tokens at each time step instead instead of just the single next token. The future n-gram prediction explicitly encourages the model to plan for the future tokens and prevent overfitting on strong local correlations. +The model architecture is based on the original Transformer, but replaces the "standard" self-attention mechanism in the decoder by a a main self-attention mechanism and a self and n-stream (predict) self-attention mechanism. + +The library provides a pre-trained version of this model for conditional generation and a fine-tuned version for summarization. + +XLM-ProphetNet +----------------------------------------------------------------------------------------------------------------------- + +.. raw:: html + + + Models + + + Doc + + +`ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training, `__ by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang, Ming Zhou. + +XLM-ProphetNet's model architecture and pre-training objective is same as ProphetNet, but XLM-ProphetNet was pre-trained on the cross-lingual dataset `XGLUE `__. + +The library provides a pre-trained version of this model for multi-lingual conditional generation and fine-tuned versions for headline generation and question generation, respectively. + Multimodal models ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^