From a4db4e303208a5b7b4ce1564301d77e9b74b01d9 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 21 Aug 2020 16:22:10 +0200 Subject: [PATCH] [Docs model summaries] Add pegasus to docs (#6640) * add pegasus to docs * Update docs/source/model_summary.rst --- docs/source/model_summary.rst | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/docs/source/model_summary.rst b/docs/source/model_summary.rst index 2c7cd4558f..79bc495865 100644 --- a/docs/source/model_summary.rst +++ b/docs/source/model_summary.rst @@ -478,6 +478,31 @@ pretraining tasks, a composition of the following transformations are applied: The library provides a version of this model for conditional generation and sequence classification. +Pegasus +---------------------------------------------- + +.. raw:: html + + + Models + + + Doc + + +`PEGASUS: Pre-training with Extracted Gap-sentences forAbstractive Summarization +`_, Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019. + +Sequence-to-sequence model with the same encoder-decoder model architecture as BART. Pegasus is pre-trained jointly on two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training objective, called Gap Sentence Generation (GSG). + + * MLM: encoder input tokens are randomely replaced by a mask tokens and have to be predicted by the encoder (like in BERT) + * GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a causal mask to hide the future words like a regular auto-regressive transformer decoder. + +In contrast to BART, Pegasus' pretraining task is intentionally similar to summarization: important sentences are masked and are generated together as one output sequence from the remaining sentences, similar to an extractive summary. + +The library provides a version of this model for conditional generation, which should be used for summarization. + + MarianMT ----------------------------------------------