Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068)

Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
2022-11-04 15:32:44 +00:00
parent 6e1c5786dc
commit 3bd0007e87
7 changed files with 26 additions and 0 deletions
--- a/docs/source/en/model_doc/bart.mdx
+++ b/docs/source/en/model_doc/bart.mdx
@@ -32,6 +32,11 @@ According to the abstract,
  state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
  of up to 6 ROUGE.

+Tips:
+
+- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
+  the left.
+
 This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).


--- a/docs/source/en/model_doc/big_bird.mdx
+++ b/docs/source/en/model_doc/big_bird.mdx
@@ -46,6 +46,8 @@ Tips:
 - Sequence length must be divisible by block size.
 - Current implementation supports only **ITC**.
 - Current implementation doesn't support **num_random_blocks = 0**
+- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
+  the left.

 This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
 [here](https://github.com/google-research/bigbird).
--- a/docs/source/en/model_doc/bigbird_pegasus.mdx
+++ b/docs/source/en/model_doc/bigbird_pegasus.mdx
@@ -47,6 +47,8 @@ Tips:
 - Current implementation supports only **ITC**.
 - Current implementation doesn't support **num_random_blocks = 0**.
 - BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).
+- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
+  the left.

 The original code can be found [here](https://github.com/google-research/bigbird).

--- a/docs/source/en/model_doc/blenderbot-small.mdx
+++ b/docs/source/en/model_doc/blenderbot-small.mdx
@@ -36,6 +36,11 @@ and code publicly available. Human evaluations show our best models are superior
 dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
 failure cases of our models.*

+Tips:
+
+- Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
+  the left.
+
 This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be
 found [here](https://github.com/facebookresearch/ParlAI) .

--- a/docs/source/en/model_doc/blenderbot.mdx
+++ b/docs/source/en/model_doc/blenderbot.mdx
@@ -32,6 +32,11 @@ and code publicly available. Human evaluations show our best models are superior
 dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
 failure cases of our models.*

+Tips:
+
+- Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
+  the left.
+
 This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) .


--- a/docs/source/en/model_doc/led.mdx
+++ b/docs/source/en/model_doc/led.mdx
@@ -50,6 +50,8 @@ Tips:
  flag can be used to disable the caching mechanism to save memory.
 - A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
 - A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).
+- LED is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
+  the left.

 This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).

--- a/docs/source/en/model_doc/prophetnet.mdx
+++ b/docs/source/en/model_doc/prophetnet.mdx
@@ -35,6 +35,11 @@ dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Giga
 abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new
 state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.*

+Tips:
+
+- ProphetNet is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
+  the left.
+
 The Authors' code can be found [here](https://github.com/microsoft/ProphetNet).