Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068)
Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
This commit is contained in:
@@ -32,6 +32,11 @@ According to the abstract,
|
||||
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
||||
of up to 6 ROUGE.
|
||||
|
||||
Tips:
|
||||
|
||||
- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
|
||||
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).
|
||||
|
||||
|
||||
|
||||
@@ -46,6 +46,8 @@ Tips:
|
||||
- Sequence length must be divisible by block size.
|
||||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**
|
||||
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
|
||||
This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
|
||||
[here](https://github.com/google-research/bigbird).
|
||||
|
||||
@@ -47,6 +47,8 @@ Tips:
|
||||
- Current implementation supports only **ITC**.
|
||||
- Current implementation doesn't support **num_random_blocks = 0**.
|
||||
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).
|
||||
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
|
||||
The original code can be found [here](https://github.com/google-research/bigbird).
|
||||
|
||||
|
||||
@@ -36,6 +36,11 @@ and code publicly available. Human evaluations show our best models are superior
|
||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be
|
||||
found [here](https://github.com/facebookresearch/ParlAI) .
|
||||
|
||||
|
||||
@@ -32,6 +32,11 @@ and code publicly available. Human evaluations show our best models are superior
|
||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||
failure cases of our models.*
|
||||
|
||||
Tips:
|
||||
|
||||
- Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
|
||||
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) .
|
||||
|
||||
|
||||
|
||||
@@ -50,6 +50,8 @@ Tips:
|
||||
flag can be used to disable the caching mechanism to save memory.
|
||||
- A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
|
||||
- A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).
|
||||
- LED is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
|
||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
|
||||
|
||||
|
||||
@@ -35,6 +35,11 @@ dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Giga
|
||||
abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new
|
||||
state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.*
|
||||
|
||||
Tips:
|
||||
|
||||
- ProphetNet is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
|
||||
The Authors' code can be found [here](https://github.com/microsoft/ProphetNet).
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user