Pytorch GPT

2020-01-17 10:50:25 -05:00
parent 1487b840d3
commit 850795c487
3 changed files with 184 additions and 129 deletions
--- a/docs/source/model_doc/gpt.rst
+++ b/docs/source/model_doc/gpt.rst
@@ -1,6 +1,38 @@
 OpenAI GPT
 ----------------------------------------------------

+OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training`_
+by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional)
+transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
+
+The abstract from the paper is the following:
+
+*Natural language understanding comprises a wide range of diverse tasks such
+as textual entailment, question answering, semantic similarity assessment, and
+document classification. Although large unlabeled text corpora are abundant,
+labeled data for learning these specific tasks is scarce, making it challenging for
+discriminatively trained models to perform adequately. We demonstrate that large
+gains on these tasks can be realized by generative pre-training of a language model
+on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each
+specific task. In contrast to previous approaches, we make use of task-aware input
+transformations during fine-tuning to achieve effective transfer while requiring
+minimal changes to the model architecture. We demonstrate the effectiveness of
+our approach on a wide range of benchmarks for natural language understanding.
+Our general task-agnostic model outperforms discriminatively trained models that
+use architectures specifically crafted for each task, significantly improving upon the
+state of the art in 9 out of the 12 tasks studied.*
+
+Tips:
+
+- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
+  the right rather than the left.
+- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
+  token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as
+  it can be observed in the `run_generation.py` example script.
+
+`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
+Hugging Face showcasing the generative capabilities of several models. GPT is one of them.
+
 ``OpenAIGPTConfig``
 ~~~~~~~~~~~~~~~~~~~~~