Update documentation on seq2seq models with absolute positional embeddings, to be in line with Tips section for BERT and GPT2 (#20068)
Co-authored-by: jordiclive <jordiclive19@imperial.ac.uk>
This commit is contained in:
@@ -32,6 +32,11 @@ According to the abstract,
|
|||||||
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
||||||
of up to 6 ROUGE.
|
of up to 6 ROUGE.
|
||||||
|
|
||||||
|
Tips:
|
||||||
|
|
||||||
|
- BART is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
|
the left.
|
||||||
|
|
||||||
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).
|
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The Authors' code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/bart).
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -46,6 +46,8 @@ Tips:
|
|||||||
- Sequence length must be divisible by block size.
|
- Sequence length must be divisible by block size.
|
||||||
- Current implementation supports only **ITC**.
|
- Current implementation supports only **ITC**.
|
||||||
- Current implementation doesn't support **num_random_blocks = 0**
|
- Current implementation doesn't support **num_random_blocks = 0**
|
||||||
|
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
|
the left.
|
||||||
|
|
||||||
This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
|
This model was contributed by [vasudevgupta](https://huggingface.co/vasudevgupta). The original code can be found
|
||||||
[here](https://github.com/google-research/bigbird).
|
[here](https://github.com/google-research/bigbird).
|
||||||
|
|||||||
@@ -47,6 +47,8 @@ Tips:
|
|||||||
- Current implementation supports only **ITC**.
|
- Current implementation supports only **ITC**.
|
||||||
- Current implementation doesn't support **num_random_blocks = 0**.
|
- Current implementation doesn't support **num_random_blocks = 0**.
|
||||||
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).
|
- BigBirdPegasus uses the [PegasusTokenizer](https://github.com/huggingface/transformers/blob/main/src/transformers/models/pegasus/tokenization_pegasus.py).
|
||||||
|
- BigBird is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
|
the left.
|
||||||
|
|
||||||
The original code can be found [here](https://github.com/google-research/bigbird).
|
The original code can be found [here](https://github.com/google-research/bigbird).
|
||||||
|
|
||||||
|
|||||||
@@ -36,6 +36,11 @@ and code publicly available. Human evaluations show our best models are superior
|
|||||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||||
failure cases of our models.*
|
failure cases of our models.*
|
||||||
|
|
||||||
|
Tips:
|
||||||
|
|
||||||
|
- Blenderbot Small is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
|
the left.
|
||||||
|
|
||||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be
|
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten). The authors' code can be
|
||||||
found [here](https://github.com/facebookresearch/ParlAI) .
|
found [here](https://github.com/facebookresearch/ParlAI) .
|
||||||
|
|
||||||
|
|||||||
@@ -32,6 +32,11 @@ and code publicly available. Human evaluations show our best models are superior
|
|||||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||||
failure cases of our models.*
|
failure cases of our models.*
|
||||||
|
|
||||||
|
Tips:
|
||||||
|
|
||||||
|
- Blenderbot is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
|
the left.
|
||||||
|
|
||||||
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) .
|
This model was contributed by [sshleifer](https://huggingface.co/sshleifer). The authors' code can be found [here](https://github.com/facebookresearch/ParlAI) .
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -50,6 +50,8 @@ Tips:
|
|||||||
flag can be used to disable the caching mechanism to save memory.
|
flag can be used to disable the caching mechanism to save memory.
|
||||||
- A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
|
- A notebook showing how to evaluate LED, can be accessed [here](https://colab.research.google.com/drive/12INTTR6n64TzS4RrXZxMSXfrOd9Xzamo?usp=sharing).
|
||||||
- A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).
|
- A notebook showing how to fine-tune LED, can be accessed [here](https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing).
|
||||||
|
- LED is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
|
the left.
|
||||||
|
|
||||||
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
|
This model was contributed by [patrickvonplaten](https://huggingface.co/patrickvonplaten).
|
||||||
|
|
||||||
|
|||||||
@@ -35,6 +35,11 @@ dataset (160GB) respectively. Then we conduct experiments on CNN/DailyMail, Giga
|
|||||||
abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new
|
abstractive summarization and question generation tasks. Experimental results show that ProphetNet achieves new
|
||||||
state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.*
|
state-of-the-art results on all these datasets compared to the models using the same scale pretraining corpus.*
|
||||||
|
|
||||||
|
Tips:
|
||||||
|
|
||||||
|
- ProphetNet is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||||
|
the left.
|
||||||
|
|
||||||
The Authors' code can be found [here](https://github.com/microsoft/ProphetNet).
|
The Authors' code can be found [here](https://github.com/microsoft/ProphetNet).
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user