From 3bd0007e8716540aa83298d3d3584d83e3ae3e08 Mon Sep 17 00:00:00 2001 From: Jordan Clive Date: Fri, 4 Nov 2022 15:32:44 +0000 Subject: [PATCH] Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068) Co-authored-by: jordiclive --- docs/source/en/model_doc/bart.mdx | 5 +++++ docs/source/en/model_doc/big_bird.mdx | 2 ++ docs/source/en/model_doc/bigbird_pegasus.mdx | 2 ++ docs/source/en/model_doc/blenderbot-small.mdx | 5 +++++ docs/source/en/model_doc/blenderbot.mdx | 5 +++++ docs/source/en/model_doc/led.mdx | 2 ++ docs/source/en/model_doc/prophetnet.mdx | 5 +++++ 7 files changed, 26 insertions(+) diff --git a/docs/source/en/model_doc/bart.mdx b/docs/source/en/model_doc/bart.mdx index 6b29782444..04084e66a2 100644 --- a/docs/source/en/model_doc/bart.mdx +++ b/docs/source/en/model_doc/bart.mdx @@ -32,6 +32,11 @@ According to the abstract, state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. +Tips: + +- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than + the left. + This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart). diff --git a/docs/source/en/model_doc/big_bird.mdx b/docs/source/en/model_doc/big_bird.mdx index 0e1e6ac53e..fa15d32cdb 100644 --- a/docs/source/en/model_doc/big_bird.mdx +++ b/docs/source/en/model_doc/big_bird.mdx @@ -46,6 +46,8 @@ Tips: - Sequence length must be divisible by block size. - Current implementation supports only **ITC**. - Current implementation doesn't support **num_random_blocks = 0** +- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than + the left. This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found [here](https://github.com/google-research/bigbird). diff --git a/docs/source/en/model_doc/bigbird_pegasus.mdx b/docs/source/en/model_doc/bigbird_pegasus.mdx index 50ef4720e3..1ba4b71d73 100644 --- a/docs/source/en/model_doc/bigbird_pegasus.mdx +++ b/docs/source/en/model_doc/bigbird_pegasus.mdx @@ -47,6 +47,8 @@ Tips: - Current implementation supports only **ITC**. - Current implementation doesn't support **num_random_blocks = 0**. - BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py). +- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than + the left. The original code can be found [here](https://github.com/google-research/bigbird). diff --git a/docs/source/en/model_doc/blenderbot-small.mdx b/docs/source/en/model_doc/blenderbot-small.mdx index 2b762838c4..c4b157cac1 100644 --- a/docs/source/en/model_doc/blenderbot-small.mdx +++ b/docs/source/en/model_doc/blenderbot-small.mdx @@ -36,6 +36,11 @@ and code publicly available. Human evaluations show our best models are superior dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.* +Tips: + +- Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than + the left. + This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) . diff --git a/docs/source/en/model_doc/blenderbot.mdx b/docs/source/en/model_doc/blenderbot.mdx index 97cbd62e57..75706e13ec 100644 --- a/docs/source/en/model_doc/blenderbot.mdx +++ b/docs/source/en/model_doc/blenderbot.mdx @@ -32,6 +32,11 @@ and code publicly available. Human evaluations show our best models are superior dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing failure cases of our models.* +Tips: + +- Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than + the left. + This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) . diff --git a/docs/source/en/model_doc/led.mdx b/docs/source/en/model_doc/led.mdx index 63880d874f..6ecdf808e2 100644 --- a/docs/source/en/model_doc/led.mdx +++ b/docs/source/en/model_doc/led.mdx @@ -50,6 +50,8 @@ Tips: flag can be used to disable the caching mechanism to save memory. - A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing). - A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing). +- LED is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than + the left. This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). diff --git a/docs/source/en/model_doc/prophetnet.mdx b/docs/source/en/model_doc/prophetnet.mdx index 951bbc5b96..14d0b3a924 100644 --- a/docs/source/en/model_doc/prophetnet.mdx +++ b/docs/source/en/model_doc/prophetnet.mdx @@ -35,6 +35,11 @@ dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Giga abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.* +Tips: + +- ProphetNet is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than + the left. + The Authors' code can be found [here](https://github.com/microsoft/ProphetNet).