[Docs] Add DialoGPT (#3755)
* add dialoGPT * update README.md * fix conflict * update readme * add code links to docs * Update README.md * Update dialo_gpt2.rst * Update pretrained_models.rst * Update docs/source/model_doc/dialo_gpt2.rst Co-Authored-By: Julien Chaumond <chaumond@gmail.com> * change filename of dialogpt Co-authored-by: Julien Chaumond <chaumond@gmail.com>
This commit is contained in:
committed by
GitHub
parent
c59b1e682d
commit
d22894dfd4
@@ -30,6 +30,8 @@ Tips:
|
||||
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
|
||||
number of (repeating) layers.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/ALBERT>`_.
|
||||
|
||||
AlbertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -35,6 +35,8 @@ Tips:
|
||||
prediction rather than a token prediction. However, averaging over the sequence may yield better results than using
|
||||
the [CLS] token.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/bert>`_.
|
||||
|
||||
BertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -22,6 +22,8 @@ Tips:
|
||||
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
|
||||
examples as well as the information relative to the inputs and outputs.
|
||||
|
||||
The original code can be found `here <https://camembert-model.fr/>`_.
|
||||
|
||||
CamembertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -31,6 +31,8 @@ Tips:
|
||||
See `reusing the past in generative models <../quickstart.html#using-the-past>`_ for more information on the usage
|
||||
of this argument.
|
||||
|
||||
The original code can be found `here <https://github.com/salesforce/ctrl>`_.
|
||||
|
||||
|
||||
CTRLConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
29
docs/source/model_doc/dialogpt.rst
Normal file
29
docs/source/model_doc/dialogpt.rst
Normal file
@@ -0,0 +1,29 @@
|
||||
DialoGPT
|
||||
----------------------------------------------------
|
||||
|
||||
Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
DialoGPT was proposed in
|
||||
`DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation <https://arxiv.org/abs/1911.00536>`_
|
||||
by Yizhe Zhang, Siqi Sun, Michel Galley, Yen-Chun Chen, Chris Brockett, Xiang Gao, Jianfeng Gao, Jingjing Liu, Bill Dolan.
|
||||
It's a GPT2 Model trained on 147M conversation-like exchanges extracted from Reddit.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*We present a large, tunable neural conversational response generation model, DialoGPT (dialogue generative pre-trained transformer).
|
||||
Trained on 147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, DialoGPT extends the Hugging Face PyTorch transformer to attain a performance close to human both in terms of automatic and human evaluation in single-turn dialogue settings.
|
||||
We show that conversational systems that leverage DialoGPT generate more relevant, contentful and context-consistent responses than strong baseline systems.
|
||||
The pre-trained model and training pipeline are publicly released to facilitate research into neural response generation and the development of more intelligent open-domain dialogue systems.*
|
||||
|
||||
Tips:
|
||||
|
||||
- DialoGPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||
the right rather than the left.
|
||||
- DialoGPT was trained with a causal language modeling (CLM) objective on conversational data and is therefore powerful at response generation in open-domain dialogue systems.
|
||||
- DialoGPT enables the user to create a chat bot in just 10 lines of code as shown on `DialoGPT's model card <https://huggingface.co/microsoft/DialoGPT-medium>`_.
|
||||
|
||||
|
||||
DialoGPT's architecture is based on the GPT2 model, so one can refer to GPT2's `docstring <https://huggingface.co/transformers/model_doc/gpt2.html>`_.
|
||||
|
||||
The original code can be found `here <https://github.com/microsoft/DialoGPT>`_.
|
||||
@@ -27,6 +27,8 @@ Tips:
|
||||
- DistilBert doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `[SEP]`)
|
||||
- DistilBert doesn't have options to select the input positions (`position_ids` input). This could be added if necessary though, just let's us know if you need this option.
|
||||
|
||||
The original code can be found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`_.
|
||||
|
||||
|
||||
DistilBertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -44,6 +44,8 @@ Tips:
|
||||
and the generator may be loaded in the `ElectraForPreTraining` model (the classification head will be randomly
|
||||
initialized as it doesn't exist in the generator).
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/electra>`_.
|
||||
|
||||
|
||||
ElectraConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -20,6 +20,8 @@ of the time they outperform other pre-training approaches. Different versions of
|
||||
evaluation protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared
|
||||
to the research community for further reproducible experiments in French NLP.*
|
||||
|
||||
The original code can be found `here <https://github.com/getalp/Flaubert>`_.
|
||||
|
||||
|
||||
FlaubertConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -36,6 +36,9 @@ Tips:
|
||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
|
||||
Hugging Face showcasing the generative capabilities of several models. GPT is one of them.
|
||||
|
||||
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`_.
|
||||
|
||||
|
||||
OpenAIGPTConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -34,6 +34,8 @@ Tips:
|
||||
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
||||
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: distilgpt-2.
|
||||
|
||||
The original code can be found `here <https://openai.com/blog/better-language-models/>`_.
|
||||
|
||||
|
||||
GPT2Config
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -28,6 +28,9 @@ Tips:
|
||||
- RoBERTa doesn't have `token_type_ids`, you don't need to indicate which token belongs to which segment. Just separate your segments with the separation token `tokenizer.sep_token` (or `</s>`)
|
||||
- `Camembert <./camembert.html>`__ is a wrapper around RoBERTa. Refer to this page for usage examples.
|
||||
|
||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
|
||||
|
||||
|
||||
RobertaConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -57,6 +57,8 @@ Tips
|
||||
- For sequence to sequence generation, it is recommended to use ``T5ForConditionalGeneration.generate()``. The method takes care of feeding the encoded input via cross-attention layers to the decoder and auto-regressively generates the decoder output.
|
||||
- T5 uses relative scalar embeddings. Encoder input padding can be done on the left and on the right.
|
||||
|
||||
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`_.
|
||||
|
||||
|
||||
T5Config
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -30,6 +30,8 @@ Tips:
|
||||
The original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
|
||||
- Transformer-XL is one of the few models that has no sequence length limit.
|
||||
|
||||
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`_.
|
||||
|
||||
|
||||
TransfoXLConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -30,6 +30,8 @@ Tips:
|
||||
- XLM has multilingual checkpoints which leverage a specific `lang` parameter. Check out the
|
||||
`multi-lingual <../multilingual.html>`__ page for more information.
|
||||
|
||||
The original code can be found `here <https://github.com/facebookresearch/XLM/>`_.
|
||||
|
||||
|
||||
XLMConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@@ -28,6 +28,9 @@ Tips:
|
||||
- This implementation is the same as RoBERTa. Refer to the `documentation of RoBERTa <./roberta.html>`__ for usage
|
||||
examples as well as the information relative to the inputs and outputs.
|
||||
|
||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`_.
|
||||
|
||||
|
||||
XLMRobertaConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -32,6 +32,8 @@ Tips:
|
||||
`target_mapping` inputs to control the attention span and outputs (see examples in `examples/run_generation.py`)
|
||||
- XLNet is one of the few models that has no sequence length limit.
|
||||
|
||||
The original code can be found `here <https://github.com/zihangdai/xlnet/>`_.
|
||||
|
||||
|
||||
XLNetConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Reference in New Issue
Block a user