Doc styling (#8067)
* Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy
This commit is contained in:
@@ -5,29 +5,29 @@ Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
OpenAI GPT-2 model was proposed in `Language Models are Unsupervised Multitask Learners
|
||||
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_
|
||||
by Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
|
||||
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
|
||||
<https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_ by Alec
|
||||
Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei and Ilya Sutskever. It's a causal (unidirectional)
|
||||
transformer pretrained using language modeling on a very large corpus of ~40 GB of text data.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1]
|
||||
of 8 million web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous
|
||||
words within some text. The diversity of the dataset causes this simple goal to contain naturally occurring
|
||||
demonstrations of many tasks across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X
|
||||
the parameters and trained on more than 10X the amount of data.*
|
||||
*GPT-2 is a large transformer-based language model with 1.5 billion parameters, trained on a dataset[1] of 8 million
|
||||
web pages. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some
|
||||
text. The diversity of the dataset causes this simple goal to contain naturally occurring demonstrations of many tasks
|
||||
across diverse domains. GPT-2 is a direct scale-up of GPT, with more than 10X the parameters and trained on more than
|
||||
10X the amount of data.*
|
||||
|
||||
Tips:
|
||||
|
||||
- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||
the right rather than the left.
|
||||
- GPT-2 is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- GPT-2 was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as
|
||||
it can be observed in the `run_generation.py` example script.
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
|
||||
observed in the `run_generation.py` example script.
|
||||
- The PyTorch models can take the `past` as input, which is the previously computed key/value attention pairs. Using
|
||||
this `past` value prevents the model from re-computing pre-computed values in the context of text generation.
|
||||
See `reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage
|
||||
of this argument.
|
||||
this `past` value prevents the model from re-computing pre-computed values in the context of text generation. See
|
||||
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
|
||||
this argument.
|
||||
|
||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt2-large>`__ is a webapp created and hosted by
|
||||
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
||||
|
||||
Reference in New Issue
Block a user