Doc styling (#8067)
* Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy
This commit is contained in:
@@ -6,44 +6,39 @@ Overview
|
||||
|
||||
OpenAI GPT model was proposed in `Improving Language Understanding by Generative Pre-Training
|
||||
<https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf>`__
|
||||
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional)
|
||||
transformer pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book
|
||||
Corpus.
|
||||
by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. It's a causal (unidirectional) transformer
|
||||
pre-trained using language modeling on a large corpus will long range dependencies, the Toronto Book Corpus.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Natural language understanding comprises a wide range of diverse tasks such
|
||||
as textual entailment, question answering, semantic similarity assessment, and
|
||||
document classification. Although large unlabeled text corpora are abundant,
|
||||
labeled data for learning these specific tasks is scarce, making it challenging for
|
||||
discriminatively trained models to perform adequately. We demonstrate that large
|
||||
gains on these tasks can be realized by generative pre-training of a language model
|
||||
on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each
|
||||
specific task. In contrast to previous approaches, we make use of task-aware input
|
||||
transformations during fine-tuning to achieve effective transfer while requiring
|
||||
minimal changes to the model architecture. We demonstrate the effectiveness of
|
||||
our approach on a wide range of benchmarks for natural language understanding.
|
||||
Our general task-agnostic model outperforms discriminatively trained models that
|
||||
use architectures specifically crafted for each task, significantly improving upon the
|
||||
state of the art in 9 out of the 12 tasks studied.*
|
||||
*Natural language understanding comprises a wide range of diverse tasks such as textual entailment, question answering,
|
||||
semantic similarity assessment, and document classification. Although large unlabeled text corpora are abundant,
|
||||
labeled data for learning these specific tasks is scarce, making it challenging for discriminatively trained models to
|
||||
perform adequately. We demonstrate that large gains on these tasks can be realized by generative pre-training of a
|
||||
language model on a diverse corpus of unlabeled text, followed by discriminative fine-tuning on each specific task. In
|
||||
contrast to previous approaches, we make use of task-aware input transformations during fine-tuning to achieve
|
||||
effective transfer while requiring minimal changes to the model architecture. We demonstrate the effectiveness of our
|
||||
approach on a wide range of benchmarks for natural language understanding. Our general task-agnostic model outperforms
|
||||
discriminatively trained models that use architectures specifically crafted for each task, significantly improving upon
|
||||
the state of the art in 9 out of the 12 tasks studied.*
|
||||
|
||||
Tips:
|
||||
|
||||
- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on
|
||||
the right rather than the left.
|
||||
- GPT is a model with absolute position embeddings so it's usually advised to pad the inputs on the right rather than
|
||||
the left.
|
||||
- GPT was trained with a causal language modeling (CLM) objective and is therefore powerful at predicting the next
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as
|
||||
it can be observed in the `run_generation.py` example script.
|
||||
token in a sequence. Leveraging this feature allows GPT-2 to generate syntactically coherent text as it can be
|
||||
observed in the `run_generation.py` example script.
|
||||
|
||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by
|
||||
Hugging Face showcasing the generative capabilities of several models. GPT is one of them.
|
||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
|
||||
showcasing the generative capabilities of several models. GPT is one of them.
|
||||
|
||||
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
|
||||
|
||||
Note:
|
||||
|
||||
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install
|
||||
``ftfy`` and ``SpaCy``::
|
||||
If you want to reproduce the original tokenization process of the `OpenAI GPT` paper, you will need to install ``ftfy``
|
||||
and ``SpaCy``::
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@@ -51,8 +46,7 @@ If you want to reproduce the original tokenization process of the `OpenAI GPT` p
|
||||
python -m spacy download en
|
||||
|
||||
If you don't install ``ftfy`` and ``SpaCy``, the :class:`~transformers.OpenAIGPTTokenizer` will default to tokenize
|
||||
using BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't
|
||||
worry).
|
||||
using BERT's :obj:`BasicTokenizer` followed by Byte-Pair Encoding (which should be fine for most usage, don't worry).
|
||||
|
||||
OpenAIGPTConfig
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Reference in New Issue
Block a user