Quick tour (#5145)

* Quicktour part 1 * Update * All done * Typos Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> * Address comments in quick tour * Update docs/source/quicktour.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update from feedback Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-06-22 16:08:09 -04:00
parent 75e1eed8d1
commit 417e492f1e
7 changed files with 468 additions and 227 deletions
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -38,6 +38,16 @@ Choose the right framework for every part of a model's lifetime:
 Contents
 ---------------------------------
 The documentation is organized in five parts:
 - **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy
  and a glossary.
 - **USING TRANSFORMERS** contains general tutorials on how to use the library.
 - **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
 - **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
  transformers model
 - **PACKAGE REFERENCE** contains the documentation of each public class and function.
 The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and
 conversion utilities for the following models:
@@ -118,16 +128,17 @@ conversion utilities for the following models:
    :maxdepth: 2
    :caption: Get started
    quicktour
    installation
-    quickstart
+    philosophy
    glossary
 .. toctree::
    :maxdepth: 2
    :caption: Using Transformers
-    usage
+    task_summary
-    summary
+    model_summary
    serialization
    model_sharing
    multilingual
--- a/docs/source/main_classes/pipelines.rst
+++ b/docs/source/main_classes/pipelines.rst
@@ -17,7 +17,7 @@ The pipeline abstraction
 The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any
 other pipeline but requires an additional argument which is the `task`.
-... autofunction:: transformers.pipeline
+.. autofunction:: transformers.pipeline
 The task specific pipelines
--- a/docs/source/model_summary.rst
+++ b/docs/source/model_summary.rst
--- a/docs/source/philosophy.rst
+++ b/docs/source/philosophy.rst
@@ -0,0 +1,73 @@
 Philosophy
 ==========
 Transformers is an opinionated library built for:
 - NLP researchers and educators seeking to use/study/extend large-scale transformers models
 - hands-on practitioners who want to fine-tune those models and/or serve them in production
 - engineers who just want to download a pretrained model and use it to solve a given NLP task.
 The library was designed with two strong goals in mind:
 - Be as easy and fast to use as possible:
    - We strongly limited the number of user-facing abstractions to learn, in fact, there are almost no abstractions,
      just three standard classes required to use each model: :doc:`configuration <main_classes/configuration>`, 
      :doc:`models <main_classes/model>` and :doc:`tokenizer <main_classes/tokenizer>`.
    - All of these classes can be initialized in a simple and unified way from pretrained instances by using a common
      :obj:`from_pretrained()` instantiation method which will take care of downloading (if needed), caching and
      loading the related class instance and associated data (configurations' hyper-parameters, tokenizers' vocabulary, 
      and models' weights) from a pretrained checkpoint provided on 
      `Hugging Face Hub <https://huggingface.co/models>`__ or your own saved checkpoint.
    - On top of those three base classes, the library provides two APIs: :func:`~transformers.pipeline` for quickly
      using a model (plus its associated tokenizer and configuration) on a given task and 
      :func:`~transformers.Trainer`/:func:`~transformers.TFTrainer` to quickly train or fine-tune a given model.
    - As a consequence, this library is NOT a modular toolbox of building blocks for neural nets. If you want to
      extend/build-upon the library, just use regular Python/PyTorch/TensorFlow/Keras modules and inherit from the base
      classes of the library to reuse functionalities like model loading/saving.
 - Provide state-of-the-art models with performances as close as possible to the original models:
    - We provide at least one example for each architecture which reproduces a result provided by the official authors
      of said architecture.
    - The code is usually as close to the original code base as possible which means some PyTorch code may be not as
      *pytorchic* as it could be as a result of being converted TensorFlow code and vice versa.
 A few other goals:
 - Expose the models' internals as consistently as possible:
    - We give access, using a single API, to the full hidden-states and attention weights.
    - Tokenizer and base model's API are standardized to easily switch between models.
 - Incorporate a subjective selection of promising tools for fine-tuning/investigating these models:
    - A simple/consistent way to add new tokens to the vocabulary and embeddings for fine-tuning.
    - Simple ways to mask and prune transformer heads.
 - Switch easily between PyTorch and TensorFlow 2.0, allowing training using one framwork and inference using another.
 Main concepts
 ~~~~~~~~~~~~~
 The library is build around three types of classes for each model:
 - **Model classes**  such as :class:`~transformers.BertModel`, which are 30+ PyTorch models 
  (`torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__) or Keras models 
  (`tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__) that work with the pretrained
  weights provided in the library.
 - **Configuration classes** such as :class:`~transformers.BertConfig`, which store all the parameters required to build
  a model. You don't always need to instantiate these yourself. In particular, if you are using a pretrained model
  without any modification, creating the model will automatically take care of instantiating the configuration (which
  is part of the model).
 - **Tokenizer classes** such as :class:`~transformers.BertTokenizer`, which store the vocabulary for each model and
  provide methods for encoding/decoding strings in a list of token embeddings indices to be fed to a model.
 All these classes can be instantiated from pretrained instances and saved locally using two methods:
 - :obj:`from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either
  provided by the library itself (the suported models are provided in the list :doc:`here <pretrained_models>`
  or stored locally (or on a server) by the user,
 - :obj:`save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using
  :obj:`from_pretrained()`.
--- a/docs/source/quickstart.md
+++ b/docs/source/quickstart.md
@@ -1,222 +0,0 @@
 # Quickstart
 ## Philosophy
 Transformers is an opinionated library built for NLP researchers seeking to use/study/extend large-scale transformers models.
 The library was designed with two strong goals in mind:
 - be as easy and fast to use as possible:
  - we strongly limited the number of user-facing abstractions to learn, in fact, there are almost no abstractions, just three standard classes required to use each model: configuration, models and tokenizer,
  - all of these classes can be initialized in a simple and unified way from pretrained instances by using a common `from_pretrained()` instantiation method which will take care of downloading (if needed), caching and loading the related class from a pretrained instance supplied in the library or your own saved instance.
  - as a consequence, this library is NOT a modular toolbox of building blocks for neural nets. If you want to extend/build-upon the library, just use regular Python/PyTorch modules and inherit from the base classes of the library to reuse functionalities like model loading/saving.
 - provide state-of-the-art models with performances as close as possible to the original models:
  - we provide at least one example for each architecture which reproduces a result provided by the official authors of said architecture,
  - the code is usually as close to the original code base as possible which means some PyTorch code may be not as *pytorchic* as it could be as a result of being converted TensorFlow code.
 A few other goals:
 - expose the models' internals as consistently as possible:
  - we give access, using a single API to the full hidden-states and attention weights,
  - tokenizer and base model's API are standardized to easily switch between models.
 - incorporate a subjective selection of promising tools for fine-tuning/investigating these models:
  - a simple/consistent way to add new tokens to the vocabulary and embeddings for fine-tuning,
  - simple ways to mask and prune transformer heads.
 ## Main concepts
 The library is build around three types of classes for each model:
 - **model classes**  e.g., `BertModel` which are 20+ PyTorch models (`torch.nn.Modules`) that work with the pretrained weights provided in the library. In TF2, these are `tf.keras.Model`.
 - **configuration classes** which store all the parameters required to build a model, e.g., `BertConfig`. You don't always need to instantiate these your-self. In particular, if you are using a pretrained model without any modification, creating the model will automatically take care of instantiating the configuration (which is part of the model)
 - **tokenizer classes** which store the vocabulary for each model and provide methods for encoding/decoding strings in a list of token embeddings indices to be fed to a model, e.g., `BertTokenizer`
 All these classes can be instantiated from pretrained instances and saved locally using two methods:
 - `from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either provided by the library itself (currently 27 models are provided as listed [here](https://huggingface.co/transformers/pretrained_models.html)) or stored locally (or on a server) by the user,
 - `save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using `from_pretrained()`.
 We'll finish this quickstart tour by going through a few simple quick-start examples to see how we can instantiate and use these classes. The rest of the documentation is organized into two parts:
 - the **MAIN CLASSES** section details the common functionalities/method/attributes of the three main type of classes (configuration, model, tokenizer) plus some optimization related classes provided as utilities for training,
 - the **PACKAGE REFERENCE** section details all the variants of each class for each model architectures and, in particular, the input/output that you should expect when calling each of them.
 ## Quick tour: Usage
 Here are two examples showcasing a few `Bert` and `GPT2` classes and pre-trained models.
 See the full API reference for examples of each model class.
 ### BERT example
 Let's start by preparing a tokenized input (a list of token embeddings indices to be fed to Bert) from a text string using `BertTokenizer`
 ```python
 import torch
 from transformers import BertTokenizer, BertModel, BertForMaskedLM
 # OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows
 import logging
 logging.basicConfig(level=logging.INFO)
 # Load pre-trained model tokenizer (vocabulary)
 tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
 # Tokenize input
 text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
 tokenized_text = tokenizer.tokenize(text)
 # Mask a token that we will try to predict back with `BertForMaskedLM`
 masked_index = 8
 tokenized_text[masked_index] = '[MASK]'
 assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]']
 # Convert token to vocabulary indices
 indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
 # Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
 segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
 # Convert inputs to PyTorch tensors
 tokens_tensor = torch.tensor([indexed_tokens])
 segments_tensors = torch.tensor([segments_ids])
 ```
 Let's see how we can use `BertModel` to encode our inputs in hidden-states:
 ```python
 # Load pre-trained model (weights)
 model = BertModel.from_pretrained('bert-base-uncased')
 # Set the model in evaluation mode to deactivate the DropOut modules
 # This is IMPORTANT to have reproducible results during evaluation!
 model.eval()
 # If you have a GPU, put everything on cuda
 tokens_tensor = tokens_tensor.to('cuda')
 segments_tensors = segments_tensors.to('cuda')
 model.to('cuda')
 # Predict hidden states features for each layer
 with torch.no_grad():
    # See the models docstrings for the detail of the inputs
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
    # Transformers models always output tuples.
    # See the models docstrings for the detail of all the outputs
    # In our case, the first element is the hidden state of the last layer of the Bert model
    encoded_layers = outputs[0]
 # We have encoded our input sequence in a FloatTensor of shape (batch size, sequence length, model hidden dimension)
 assert tuple(encoded_layers.shape) == (1, len(indexed_tokens), model.config.hidden_size)
 ```
 And how to use `BertForMaskedLM` to predict a masked token:
 ```python
 # Load pre-trained model (weights)
 model = BertForMaskedLM.from_pretrained('bert-base-uncased')
 model.eval()
 # If you have a GPU, put everything on cuda
 tokens_tensor = tokens_tensor.to('cuda')
 segments_tensors = segments_tensors.to('cuda')
 model.to('cuda')
 # Predict all tokens
 with torch.no_grad():
    outputs = model(tokens_tensor, token_type_ids=segments_tensors)
    predictions = outputs[0]
 # confirm we were able to predict 'henson'
 predicted_index = torch.argmax(predictions[0, masked_index]).item()
 predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0]
 assert predicted_token == 'henson'
 ```
 ### OpenAI GPT-2
 Here is a quick-start example using `GPT2Tokenizer` and `GPT2LMHeadModel` class with OpenAI's pre-trained model to predict the next token from a text prompt.
 First let's prepare a tokenized input from our text string using `GPT2Tokenizer`
 ```python
 import torch
 from transformers import GPT2Tokenizer, GPT2LMHeadModel
 # OPTIONAL: if you want to have more information on what's happening, activate the logger as follows
 import logging
 logging.basicConfig(level=logging.INFO)
 # Load pre-trained model tokenizer (vocabulary)
 tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
 # Encode a text inputs
 text = "Who was Jim Henson ? Jim Henson was a"
 indexed_tokens = tokenizer.encode(text)
 # Convert indexed tokens in a PyTorch tensor
 tokens_tensor = torch.tensor([indexed_tokens])
 ```
 Let's see how to use `GPT2LMHeadModel` to generate the next token following our text:
 ```python
 # Load pre-trained model (weights)
 model = GPT2LMHeadModel.from_pretrained('gpt2')
 # Set the model in evaluation mode to deactivate the DropOut modules
 # This is IMPORTANT to have reproducible results during evaluation!
 model.eval()
 # If you have a GPU, put everything on cuda
 tokens_tensor = tokens_tensor.to('cuda')
 model.to('cuda')
 # Predict all tokens
 with torch.no_grad():
    outputs = model(tokens_tensor)
    predictions = outputs[0]
 # get the predicted next sub-word (in our case, the word 'man')
 predicted_index = torch.argmax(predictions[0, -1, :]).item()
 predicted_text = tokenizer.decode(indexed_tokens + [predicted_index])
 assert predicted_text == 'Who was Jim Henson? Jim Henson was a man'
 ```
 Examples for each model class of each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [documentation](#documentation).
 #### Using the past
 GPT-2, as well as some other models (GPT, XLNet, Transfo-XL, CTRL), make use of a `past` or `mems` attribute which can be used to prevent re-computing the key/value pairs when using sequential decoding. It is useful when generating sequences as a big part of the attention mechanism benefits from previous computations.
 Here is a fully-working example using the `past` with `GPT2LMHeadModel` and argmax decoding (which should only be used as an example, as argmax decoding introduces a lot of repetition):
 ```python
 from transformers import GPT2LMHeadModel, GPT2Tokenizer
 import torch
 tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
 model = GPT2LMHeadModel.from_pretrained('gpt2')
 generated = tokenizer.encode("The Manhattan bridge")
 context = torch.tensor([generated])
 past = None
 for i in range(100):
    print(i)
    output, past = model(context, past=past)
    token = torch.argmax(output[..., -1, :])
    generated += [token.tolist()]
    context = token.unsqueeze(0)
 sequence = tokenizer.decode(generated)
 print(sequence)
 ```
 The model only requires a single token as input as all the previous tokens' key/value pairs are contained in the `past`.
--- a/docs/source/quicktour.rst
+++ b/docs/source/quicktour.rst
@@ -0,0 +1,379 @@
 Quick tour
 ==========
 Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for
 Natural Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG),
 such as completing a prompt with new text or translating in another language.
 First we will see how to easily leverage the pipeline API to quickly use those pretrained models at inference. Then, we
 will dig a little bit more and see how the library gives you access to those models and helps you preprocess your data.
 .. note::
    All code examples presented in the documentation have a switch on the top left for Pytorch versus TensorFlow. If
    not, the code is expected to work for both backends without any change needed.
 Getting started on a task with a pipeline
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`. 🤗 Transformers
 provides the following tasks out of the box:
 - Sentiment analysis: is a text positive or negative?
 - Text generation (in English): provide a prompt and the model will generate what follows.
 - Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place,
  etc.)
 - Question answering: provide the model with some context and a question, extract the answer from the context.
 - Filling masked text: given a text with masked words (e.g., replaced by ``[MASK]``), fill the blanks.
 - Summarization: generate a summary of a long text.
 - Translation: translate a text in another language.
 - Feature extraction: return a tensor representation of the text.
 Let's see how this work for sentiment analysis (the other tasks are all covered in the
 :doc:`task summary </task_summary>`):
 ::
    from transformers import pipeline
    classifier = pipeline('sentiment-analysis')
 When typing this command for the first time, a pretrained model and its tokenizer are downloaded and cached. We will
 look at both later on, but as an introduction the tokenizer's job is to preprocess the text for the model, which is
 then responsible for making predictions. The pipeline groups all of that together, and post-process the predictions to
 make them readable. For instance
 ::
    classifier('We are very happy to show you the Transformers library.')
 will return something like this:
 ::
    [{'label': 'POSITIVE', 'score': 0.999799370765686}]
 That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model as a
 `batch`:
 ::
    classifier(["We are very happy to show you the Transformers library.",
                "We hope you don't hate it."])
 returning a list of dictionaries like this one:
 ::
    [{'label': 'POSITIVE', 'score': 0.999799370765686},
     {'label': 'NEGATIVE', 'score': 0.5308589935302734}]
 You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is
 fairly neutral.
 By default, the model downloaded for this pipeline is called "distilbert-base-uncased-finetuned-sst-2-english". We can
 look at its `model page <https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english>`__ to get more
 information about it. It uses the :doc:`DistilBERT architecture </model_doc/distilbert>` and has been fine-tuned on a
 dataset called SST-2 for the sentiment analysis task.
 Let's say we want to use another model; for instance, one that has been trained on French data. We can search through
 the `model hub <https://huggingface.co/models>`__ that gathers models pretrained on a lot of data by research labs, but
 also community models (usually fine-tuned versions of those big models on a specific dataset). Applying the tags
 "French" and "text-classification" gives back a suggestion "nlptown/bert-base-multilingual-uncased-sentiment". Let's
 see how we can use it. 
 You can directly pass the name of the model to use to :func:`~transformers.pipeline`:
 ::
    classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment")
 This classifier can now deal with texts in English, French, but also Dutch, German, Italian and Spanish! You can also
 replace that name by a local folder where you have saved a pretrained model (see below). You can also pass a model
 object and its associated tokenizer.
 We will need two classes for this. The first is :class:`~transformers.AutoTokenizer`, which we will use to download the
 tokenizer associated to the model we picked and instantiate it. The second is
 :class:`~transformers.AutoModelForSequenceClassification` (or
 :class:`~transformers.TFAutoModelForSequenceClassification` if you are using TensorFlow), which we will use to download
 the model itself. Note that if we were using the library on an other task, the class of the model would change. The
 :doc:`task summary </task_summary>` tutorial summarizes which class is used for which task.
 ::
    ## PYTORCH CODE
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    ## TENSORFLOW CODE
    from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
 Now, to download the models and tokenizer we found previously, we just have to use the 
 :func:`~transformers.AutoModelForSequenceClassification.from_pretrained` method (feel free to replace ``model_name`` by
 any other model from the model hub):
 ::
    ## PYTORCH CODE
    model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    pipe = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
    ## TENSORFLOW CODE
    model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
    model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
 If you don't find a model that has been pretrained on some data similar to yours, you will need to fine-tune a
 pretrained model on your data. We provide :doc:`example scripts </examples>` to do so. Once you're done, don't forget
 to share your fine-tuned model on the hub with the community, using :doc:`this tutorial </model_sharing>`.
 .. _pretrained-model:
 Under the hood: pretrained models
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Let's now see what happens beneath the hood when using those pipelines. As we saw, the model and tokenizer are created
 using the :obj:`from_pretrained` method:
 ::
    ## PYTORCH CODE
    from transformers import AutoTokenizer, AutoModelForSequenceClassification
    model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    ## TENSORFLOW CODE
    from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
    model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
 Using the tokenizer
 ^^^^^^^^^^^^^^^^^^^
 We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in
 words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern
 that process, which is why we need to instantiate the tokenizer using the name of the model, to make sure we use the
 same rules as when the model was pretrained.
 The second step is to convert those `tokens` into numbers, to be able to build a tensor out of them and feed them to
 the model. To do this, the tokenizer has a `vocab`, which is the part we download when we instantiate it with the
 :obj:`from_pretrained` method, since we need to use the same `vocab` as when the model was pretrained.
 To apply these steps on a given text, we can just feed it to our tokenizer:
 ::
    input = tokenizer("We are very happy to show you the Transformers library.")
    print(input)
 This returns a dictionary string to list of ints. It contains the `ids of the tokens <glossary.html#input-ids>`__,
 as mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an
 `attention mask <glossary.html#attention-mask>`__ that the model will use to have a better understanding of the sequence:
 ::
    {'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 19081, 3075, 1012, 102],
     'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
 You can pass a list of sentences directly to your tokenizer. If your goal is to send them through your model as a
 batch, you probably want to pad them all to the same length, truncate them to the maximum length the model can accept
 and get tensors back. You can specify all of that to the tokenizer:
 ::
    ## PYTORCH CODE
    batch = tokenizer(
        ["We are very happy to show you the Transformers library.",
         "We hope you don't hate it."],
        padding=True, truncation=True, return_tensors="pt")
    print(batch)
    ## TENSORFLOW CODE
    batch = tokenizer(
        ["We are very happy to show you the Transformers library.",
         "We hope you don't hate it."],
        padding=True, truncation=True, return_tensors="tf")
    print(batch)
 The padding is automatically applied on the side the model expect it (in this case, on the right), with the
 padding token the model was pretrained with. The attention mask is also adapted to take the padding into account:
 ::
    {'input_ids': tensor([[  101,  2057,  2024,  2200,  3407,  2000,  2265,  2017,  1996, 19081, 3075,  1012,   102],
                          [  101,  2057,  3246,  2017,  2123,  1005,  1056,  5223,  2009,  1012,  102,     0,     0]]), 
     'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
                               [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])}
 You can learn more about tokenizers on their :doc:`doc page <main_classes/tokenizer>` (tutorial coming soon).
 Using the model
 ^^^^^^^^^^^^^^^
 Once your input has been preprocessed by the tokenizer, you can directly send it to the model. As we mentioned, it will
 contain all the relevant information the model needs. If you're using a TensorFlow model, you can directly pass the
 dictionary keys to tensor, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`.
 ::
    ## PYTORCH CODE
    outputs = model(**batch)
    ## TENSORFLOW CODE
    outputs = model(batch)
 In 🤗 Transformers, all outputs are tuples (with only one element potentially). Here, we get a tuple with just the
 final activations of the model.
 ::
    (tensor([[-4.1329,  4.3811],
             [ 0.0818, -0.0418]]),)
 .. note::
    All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final
    activation function (like SoftMax) since this final activation function is often fused with the loss.
 Let's apply the SoftMax activation to get predictions.
 ::
    ## PYTORCH CODE
    import torch.nn.functional as F
    predictions = F.softmax(outputs[0], dim=-1)
    print(predictions)
    ## TENSORFLOW CODE
    predictions = tf.nn.softmax(outputs[0], axis=-1)
    print(predictions)
 We can see we get the numbers from before:
 ::
    tensor([[2.0060e-04, 9.9980e-01],
            [5.3086e-01, 4.6914e-01]])
 If you have labels, you can provide them to the model, it will return a tuple with the loss and the final activations.
 ::
    ## PYTORCH CODE
    import torch
    outputs = model(**batch, labels = torch.tensor([1, 0])
    ## TENSORFLOW CODE
    import tensorflow as tf
    outputs = model(batch, labels = tf.constant([1, 0])
 Models are standard `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ or
 `tf.keras.Model <https://www.tensorflow.org/api_docs/python/tf/keras/Model>`__ so you can use them in your usual
 training loop. 🤗 Transformers also provides a :class:`~transformers.Trainer` (or :class:`~transformers.TFTrainer` if
 you are using TensorFlow) class to help with your training (taking care of things such as distributed training, mixed
 precision, etc.). See the training tutorial (coming soon) for more details.
 Once your model is fine-tuned, you can save it with its tokenizer the following way:
 ::
    tokenizer.save_pretrained(save_directory)
    model.save_pretrained(save_directory)
 You can then load this model back using the :func:`~transformers.AutoModel.from_pretrained` method by passing the
 directory name instead of the model name. One cool feature of 🤗 Transformers is that you can easily switch between
 PyTorch and TensorFlow: any model saved as before can be loaded back either in PyTorch or TensorFlow. If you are
 loading a saved PyTorch model in a TensorFlow model, use :func:`~transformers.TFAutoModel.from_pretrained` like this:
 ::
    tokenizer = AutoTokenizer.from_pretrained(save_directory)
    model = TFAutoModel.from_pretrained(save_directory, from_pt=True)
 and if you are loading a saved TensorFlow model in a PyTorch model, you should use the following code:
 ::
    tokenizer = AutoTokenizer.from_pretrained(save_directory)
    model = AutoModel.from_pretrained(save_directory, from_tf=True)
 Lastly, you can also ask the model to return all hidden states and all attention weights if you need them:
 ::
    ## PYTORCH CODE
    outputs = model(**batch, output_hidden_states=True, output_attentions=True)
    all_hidden_states, all_attentions = outputs[-2:]
    ## TENSORFLOW CODE
    outputs = model(batch, output_hidden_states=True, output_attentions=True)
    all_hidden_states, all_attentions = outputs[-2:]
 Accessing the code
 ^^^^^^^^^^^^^^^^^^
 The :obj:`AutoModel` and :obj:`AutoTokenizer` classes are just shortcuts that will automatically work with any
 pretrained model. Behind the scenes, the library has one model class per combination of architecture plus class, so the
 code is easy to access and tweak if you need to.
 In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's
 using the :doc:`DistilBERT </model_doc/distilbert>` architecture. The model automatically created is then a
 :class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant
 to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer
 without the auto magic:
 ::
    ## PYTORCH CODE
    from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
    model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    model = DistilBertForSequenceClassification.from_pretrained(model_name)
    tokenizer = DistilBertTokenizer.from_pretrained(model_name)
    ## TENSORFLOW CODE
    from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
    model_name = "distilbert-base-uncased-finetuned-sst-2-english"
    model = TFDistilBertForSequenceClassification.from_pretrained(model_name)
    tokenizer = DistilBertTokenizer.from_pretrained(model_name)
 Customizing the model
 ^^^^^^^^^^^^^^^^^^^^^
 If you want to change how the model itself is built, you can define your custom configuration class. Each architecture
 comes with its own relevant configuration (in the case of DistilBERT, :class:`~transformers.DistilBertConfig`) which
 allows you to specify any of the hidden dimension, dropout rate etc. If you do core modifications, like changing the
 hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would then
 instantiate the model directly from this configuration.
 Here we use the predefined vocabulary of DistilBERT (hence load the tokenizer with the
 :func:`~transformers.DistilBertTokenizer.from_pretrained` method) and initialize the model from scratch (hence
 instantiate the model from the configuration instead of using the
 :func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method).
 ::
    ## PYTORCH CODE
    from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
    config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512)
    tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
    model = DistilBertForSequenceClassification(config)
    ## TENSORFLOW CODE
    from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
    config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512)
    tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased')
    model = TFDistilBertForSequenceClassification(config)
 For something that only changes the head of the model (for instance, the number of labels), you can still use a
 pretrained model for the body. For instance, let's define a classifier for 10 different labels using a pretrained body.
 We could create a configuration with all the default values and just change the number of labels, but more easily, you
 can directly pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the
 default configuration with it:
 ::
    ## PYTORCH CODE
    from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
    model_name = "distilbert-base-uncased"
    model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
    tokenizer = DistilBertTokenizer.from_pretrained(model_name)
    ## TENSORFLOW CODE
    from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
    model_name = "distilbert-base-uncased"
    model = TFDistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
    tokenizer = DistilBertTokenizer.from_pretrained(model_name)
--- a/docs/source/task_summary.rst
+++ b/docs/source/task_summary.rst
@@ -1,4 +1,4 @@
-Usage
+Summary of the tasks
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 This page shows the most frequent use-cases when using the library. The models available allow for many different