Cleanup documentation for BART, Marian, MBART and Pegasus (#7523)
* Cleanup documentation for BART, Marian, MBART and Pegasus * Cleanup documentation for BART, Marian, MBART and Pegasus
This commit is contained in:
@@ -1,30 +1,40 @@
|
||||
Pegasus
|
||||
-----------------------------------------------------------------------------------------------------------------------
|
||||
**DISCLAIMER:** If you see something strange,
|
||||
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__ and assign
|
||||
@sshleifer.
|
||||
|
||||
**DISCLAIMER:** If you see something strange, file a `Github Issue
|
||||
<https://github.com/huggingface/transformers/issues/new?assignees=sshleifer&labels=&template=bug-report.md&title>`__
|
||||
and assign @sshleifer.
|
||||
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The Pegasus model was proposed in `PEGASUS: Pre-training with Extracted Gap-sentences for
|
||||
Abstractive Summarization <https://arxiv.org/pdf/1912.08777.pdf>`_ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu on Dec 18, 2019.
|
||||
Abstractive Summarization <https://arxiv.org/pdf/1912.08777.pdf>`__ by Jingqing Zhang, Yao Zhao, Mohammad Saleh and
|
||||
Peter J. Liu on Dec 18, 2019.
|
||||
|
||||
According to the abstract,
|
||||
|
||||
- Pegasus' pretraining task is intentionally similar to summarization: important sentences are removed/masked from an input document and are generated together as one output sequence from the remaining sentences, similar to an extractive summary.
|
||||
- Pegasus' pretraining task is intentionally similar to summarization: important sentences are removed/masked from an
|
||||
input document and are generated together as one output sequence from the remaining sentences, similar to an
|
||||
extractive summary.
|
||||
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.
|
||||
|
||||
The Authors' code can be found `here <https://github.com/google-research/pegasus>`_.
|
||||
The Authors' code can be found `here <https://github.com/google-research/pegasus>`__.
|
||||
|
||||
|
||||
Checkpoints
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
All the `checkpoints <https://huggingface.co/models?search=pegasus>`_ are finetuned for summarization, besides ``pegasus-large``, whence the other checkpoints are finetuned.
|
||||
|
||||
All the `checkpoints <https://huggingface.co/models?search=pegasus>`__ are fine-tuned for summarization, besides
|
||||
`pegasus-large`, whence the other checkpoints are fine-tuned:
|
||||
|
||||
- Each checkpoint is 2.2 GB on disk and 568M parameters.
|
||||
- FP16 is not supported (help/ideas on this appreciated!).
|
||||
- Summarizing xsum in fp32 takes about 400ms/sample, with default parameters on a v100 GPU.
|
||||
- For XSUM, The paper reports rouge1,rouge2, rougeL of paper: 47.21/24.56/39.25. As of Aug 9, this port scores 46.91/24.34/39.1.
|
||||
- For XSUM, The paper reports rouge1,rouge2, rougeL of paper: 47.21/24.56/39.25. As of Aug 9, this port scores
|
||||
46.91/24.34/39.1.
|
||||
|
||||
The gap is likely because of different alpha/length_penalty implementations in beam search.
|
||||
|
||||
|
||||
@@ -32,14 +42,16 @@ Implementation Notes
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
- All models are transformer encoder-decoders with 16 layers in each component.
|
||||
- The implementation is completely inherited from ``BartForConditionalGeneration``
|
||||
- The implementation is completely inherited from :class:`~transformers.BartForConditionalGeneration`
|
||||
- Some key configuration differences:
|
||||
- static, sinusoidal position embeddings
|
||||
- no ``layernorm_embedding`` (``PegasusConfig.normalize_embedding=False``)
|
||||
- no :obj:`layernorm_embedding` (:obj`PegasusConfig.normalize_embedding=False`)
|
||||
- the model starts generating with pad_token_id (which has 0 token_embedding) as the prefix.
|
||||
- ``num_beams=8``
|
||||
- All pretrained pegasus checkpoints are the same besides three attributes: ``tokenizer.model_max_length`` (max input size), ``max_length`` (max num tokens to generate) and ``length_penalty``
|
||||
- Code to convert checkpoints trained in the author's `repo <https://github.com/google-research/pegasus>`_ can be found in ``convert_pegasus_tf_to_pytorch.py``
|
||||
- more beams are used (:obj:`num_beams=8`)
|
||||
- All pretrained pegasus checkpoints are the same besides three attributes: :obj:`tokenizer.model_max_length` (maximum
|
||||
input size), :obj:`max_length` (the maximum number of tokens to generate) and :obj:`length_penalty`.
|
||||
- The code to convert checkpoints trained in the author's `repo <https://github.com/google-research/pegasus>`_ can be
|
||||
found in ``convert_pegasus_tf_to_pytorch.py``.
|
||||
|
||||
|
||||
Usage Example
|
||||
@@ -62,48 +74,12 @@ Usage Example
|
||||
tgt_text = tokenizer.batch_decode(translated, skip_special_tokens=True)
|
||||
assert tgt_text[0] == "California's largest electricity provider has turned off power to hundreds of thousands of customers."
|
||||
|
||||
PegasusForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
This class inherits all functionality from ``BartForConditionalGeneration``, see that page for method signatures.
|
||||
Available models are listed at `Model List <https://huggingface.co/models?search=pegasus>`__
|
||||
|
||||
.. autoclass:: transformers.PegasusForConditionalGeneration
|
||||
:members:
|
||||
|
||||
|
||||
PegasusConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
This config fully inherits from ``BartConfig``, but pegasus uses different default values:
|
||||
Up to date parameter values can be seen in `S3 <https://s3.amazonaws.com/models.huggingface.co/bert/google/pegasus-xsum/config.json>`_.
|
||||
As of Aug 10, 2020, they are:
|
||||
|
||||
.. code-block:: python
|
||||
|
||||
dict(
|
||||
vocab_size=96103,
|
||||
max_position_embeddings=512,
|
||||
d_model=1024,
|
||||
encoder_ffn_dim=4096,
|
||||
decoder_ffn_dim=4096,
|
||||
encoder_attention_heads=16,
|
||||
decoder_attention_heads=16,
|
||||
encoder_layers=16,
|
||||
decoder_layers=16,
|
||||
dropout=0.1,
|
||||
attention_dropout=0.1,
|
||||
activation_dropout=0.1,
|
||||
pad_token_id=0,
|
||||
eos_token_id=1,
|
||||
is_encoder_decoder=True,
|
||||
normalize_before=True,
|
||||
scale_embedding=True,
|
||||
normalize_embedding=False,
|
||||
add_final_layer_norm=True,
|
||||
static_position_embeddings=True,
|
||||
num_beams=8,
|
||||
activation_function="relu",
|
||||
)
|
||||
.. autoclass:: transformers.PegasusConfig
|
||||
|
||||
|
||||
PegasusTokenizer
|
||||
@@ -114,4 +90,7 @@ warning: ``add_tokens`` does not work at the moment.
|
||||
:members: __call__, prepare_seq2seq_batch
|
||||
|
||||
|
||||
PegasusForConditionalGeneration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.PegasusForConditionalGeneration
|
||||
|
||||
Reference in New Issue
Block a user