From cf5c5c9e1cfd79b2654a74bc0f3803e7b78b720a Mon Sep 17 00:00:00 2001 From: LysandreJik Date: Thu, 26 Sep 2019 07:43:13 -0400 Subject: [PATCH] Documentation --- docs/source/index.rst | 34 +++++++++++++++++++++++++++++-- docs/source/pretrained_models.rst | 6 +++--- 2 files changed, 35 insertions(+), 5 deletions(-) diff --git a/docs/source/index.rst b/docs/source/index.rst index a205b0b314..3a30b61b2c 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -1,9 +1,38 @@ Transformers ================================================================================================================================================ -Transformers is a library of state-of-the-art pre-trained models for Natural Language Processing (NLP). +🤗 Transformers (formerly known as `pytorch-transformers` and `pytorch-pretrained-bert`) provides general-purpose architectures +(BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet...) for Natural Language Understanding (NLU) and Natural Language Generation +(NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. -The library currently contains PyTorch implementations, pre-trained model weights, usage scripts and conversion utilities for the following models: +Features +--------------------------------------------------- + +- As easy to use as pytorch-transformers +- As powerful and concise as Keras +- High performance on NLU and NLG tasks +- Low barrier to entry for educators and practitioners + +State-of-the-art NLP for everyone +- Deep learning researchers +- Hands-on practitioners +- AI/ML/NLP teachers and educators + +Lower compute costs, smaller carbon footprint +- Researchers can share trained models instead of always retraining +- Practitioners can reduce compute time and production costs +- 8 architectures with over 30 pretrained models, some in more than 100 languages + +Choose the right framework for every part of a model's lifetime +- Train state-of-the-art models in 3 lines of code +- Deep interoperability between TensorFlow 2.0 and PyTorch models +- Move a single model between TF2.0/PyTorch frameworks at will +- Seamlessly pick the right framework for training, evaluation, production + +Contents +--------------------------------- + +The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models: 1. `BERT `_ (from Google) released with the paper `BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding `_ by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. 2. `GPT `_ (from OpenAI) released with the paper `Improving Language Understanding by Generative Pre-Training `_ by Alec Radford, Karthik Narasimhan, Tim Salimans and Ilya Sutskever. @@ -14,6 +43,7 @@ The library currently contains PyTorch implementations, pre-trained model weight 7. `RoBERTa `_ (from Facebook), released together with the paper a `Robustly Optimized BERT Pretraining Approach `_ by Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov. 8. `DistilBERT `_ (from HuggingFace) released together with the blog post `Smaller, faster, cheaper, lighter: Introducing DistilBERT, a distilled version of BERT `_ by Victor Sanh, Lysandre Debut and Thomas Wolf. + .. toctree:: :maxdepth: 2 :caption: Notes diff --git a/docs/source/pretrained_models.rst b/docs/source/pretrained_models.rst index 0e55767d76..4c17b35c84 100644 --- a/docs/source/pretrained_models.rst +++ b/docs/source/pretrained_models.rst @@ -44,15 +44,15 @@ Here is the full list of the currently provided pretrained models together with | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. | | | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD | -| | | (see details of fine-tuning in the `example section `__). | +| | | (see details of fine-tuning in the `example section `__). | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters | | | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD | -| | | (see `details of fine-tuning in the example section `__) | +| | | (see `details of fine-tuning in the example section `__) | | +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | | | | | The ``bert-base-cased`` model fine-tuned on MRPC | -| | | (see `details of fine-tuning in the example section `__) | +| | | (see `details of fine-tuning in the example section `__) | +-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+ | GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. | | | | | OpenAI GPT English model |