From 417e492f1e832c0b93512600d3385aa4c8a887c9 Mon Sep 17 00:00:00 2001 From: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Date: Mon, 22 Jun 2020 16:08:09 -0400 Subject: [PATCH] Quick tour (#5145) * Quicktour part 1 * Update * All done * Typos Co-authored-by: Thomas Wolf * Address comments in quick tour * Update docs/source/quicktour.rst Co-authored-by: Lysandre Debut * Update from feedback Co-authored-by: Thomas Wolf Co-authored-by: Lysandre Debut --- docs/source/index.rst | 17 +- docs/source/main_classes/pipelines.rst | 2 +- .../source/{summary.rst => model_summary.rst} | 0 docs/source/philosophy.rst | 73 ++++ docs/source/quickstart.md | 222 ---------- docs/source/quicktour.rst | 379 ++++++++++++++++++ docs/source/{usage.rst => task_summary.rst} | 2 +- 7 files changed, 468 insertions(+), 227 deletions(-) rename docs/source/{summary.rst => model_summary.rst} (100%) create mode 100644 docs/source/philosophy.rst delete mode 100644 docs/source/quickstart.md create mode 100644 docs/source/quicktour.rst rename docs/source/{usage.rst => task_summary.rst} (99%) diff --git a/docs/source/index.rst b/docs/source/index.rst index b84276ec05..07670a97c9 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -38,6 +38,16 @@ Choose the right framework for every part of a model's lifetime: Contents --------------------------------- +The documentation is organized in five parts: + +- **GET STARTED** contains a quick tour, the installation instructions and some useful information about our philosophy + and a glossary. +- **USING TRANSFORMERS** contains general tutorials on how to use the library. +- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library. +- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in + transformers model +- **PACKAGE REFERENCE** contains the documentation of each public class and function. + The library currently contains PyTorch and Tensorflow implementations, pre-trained model weights, usage scripts and conversion utilities for the following models: @@ -118,16 +128,17 @@ conversion utilities for the following models: :maxdepth: 2 :caption: Get started + quicktour installation - quickstart + philosophy glossary .. toctree:: :maxdepth: 2 :caption: Using Transformers - usage - summary + task_summary + model_summary serialization model_sharing multilingual diff --git a/docs/source/main_classes/pipelines.rst b/docs/source/main_classes/pipelines.rst index 04f918b362..ea51feb7ca 100644 --- a/docs/source/main_classes/pipelines.rst +++ b/docs/source/main_classes/pipelines.rst @@ -17,7 +17,7 @@ The pipeline abstraction The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any other pipeline but requires an additional argument which is the `task`. -... autofunction:: transformers.pipeline +.. autofunction:: transformers.pipeline The task specific pipelines diff --git a/docs/source/summary.rst b/docs/source/model_summary.rst similarity index 100% rename from docs/source/summary.rst rename to docs/source/model_summary.rst diff --git a/docs/source/philosophy.rst b/docs/source/philosophy.rst new file mode 100644 index 0000000000..be6182d19f --- /dev/null +++ b/docs/source/philosophy.rst @@ -0,0 +1,73 @@ +Philosophy +========== + +Transformers is an opinionated library built for: + +- NLP researchers and educators seeking to use/study/extend large-scale transformers models +- hands-on practitioners who want to fine-tune those models and/or serve them in production +- engineers who just want to download a pretrained model and use it to solve a given NLP task. + +The library was designed with two strong goals in mind: + +- Be as easy and fast to use as possible: + + - We strongly limited the number of user-facing abstractions to learn, in fact, there are almost no abstractions, + just three standard classes required to use each model: :doc:`configuration `, + :doc:`models ` and :doc:`tokenizer `. + - All of these classes can be initialized in a simple and unified way from pretrained instances by using a common + :obj:`from_pretrained()` instantiation method which will take care of downloading (if needed), caching and + loading the related class instance and associated data (configurations' hyper-parameters, tokenizers' vocabulary, + and models' weights) from a pretrained checkpoint provided on + `Hugging Face Hub `__ or your own saved checkpoint. + - On top of those three base classes, the library provides two APIs: :func:`~transformers.pipeline` for quickly + using a model (plus its associated tokenizer and configuration) on a given task and + :func:`~transformers.Trainer`/:func:`~transformers.TFTrainer` to quickly train or fine-tune a given model. + - As a consequence, this library is NOT a modular toolbox of building blocks for neural nets. If you want to + extend/build-upon the library, just use regular Python/PyTorch/TensorFlow/Keras modules and inherit from the base + classes of the library to reuse functionalities like model loading/saving. + +- Provide state-of-the-art models with performances as close as possible to the original models: + + - We provide at least one example for each architecture which reproduces a result provided by the official authors + of said architecture. + - The code is usually as close to the original code base as possible which means some PyTorch code may be not as + *pytorchic* as it could be as a result of being converted TensorFlow code and vice versa. + +A few other goals: + +- Expose the models' internals as consistently as possible: + + - We give access, using a single API, to the full hidden-states and attention weights. + - Tokenizer and base model's API are standardized to easily switch between models. + +- Incorporate a subjective selection of promising tools for fine-tuning/investigating these models: + + - A simple/consistent way to add new tokens to the vocabulary and embeddings for fine-tuning. + - Simple ways to mask and prune transformer heads. + +- Switch easily between PyTorch and TensorFlow 2.0, allowing training using one framwork and inference using another. + +Main concepts +~~~~~~~~~~~~~ + +The library is build around three types of classes for each model: + +- **Model classes** such as :class:`~transformers.BertModel`, which are 30+ PyTorch models + (`torch.nn.Module `__) or Keras models + (`tf.keras.Model `__) that work with the pretrained + weights provided in the library. +- **Configuration classes** such as :class:`~transformers.BertConfig`, which store all the parameters required to build + a model. You don't always need to instantiate these yourself. In particular, if you are using a pretrained model + without any modification, creating the model will automatically take care of instantiating the configuration (which + is part of the model). +- **Tokenizer classes** such as :class:`~transformers.BertTokenizer`, which store the vocabulary for each model and + provide methods for encoding/decoding strings in a list of token embeddings indices to be fed to a model. + +All these classes can be instantiated from pretrained instances and saved locally using two methods: + +- :obj:`from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either + provided by the library itself (the suported models are provided in the list :doc:`here ` + or stored locally (or on a server) by the user, +- :obj:`save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using + :obj:`from_pretrained()`. + diff --git a/docs/source/quickstart.md b/docs/source/quickstart.md deleted file mode 100644 index e327679458..0000000000 --- a/docs/source/quickstart.md +++ /dev/null @@ -1,222 +0,0 @@ -# Quickstart - -## Philosophy - -Transformers is an opinionated library built for NLP researchers seeking to use/study/extend large-scale transformers models. - -The library was designed with two strong goals in mind: - -- be as easy and fast to use as possible: - - - we strongly limited the number of user-facing abstractions to learn, in fact, there are almost no abstractions, just three standard classes required to use each model: configuration, models and tokenizer, - - all of these classes can be initialized in a simple and unified way from pretrained instances by using a common `from_pretrained()` instantiation method which will take care of downloading (if needed), caching and loading the related class from a pretrained instance supplied in the library or your own saved instance. - - as a consequence, this library is NOT a modular toolbox of building blocks for neural nets. If you want to extend/build-upon the library, just use regular Python/PyTorch modules and inherit from the base classes of the library to reuse functionalities like model loading/saving. - -- provide state-of-the-art models with performances as close as possible to the original models: - - - we provide at least one example for each architecture which reproduces a result provided by the official authors of said architecture, - - the code is usually as close to the original code base as possible which means some PyTorch code may be not as *pytorchic* as it could be as a result of being converted TensorFlow code. - -A few other goals: - -- expose the models' internals as consistently as possible: - - - we give access, using a single API to the full hidden-states and attention weights, - - tokenizer and base model's API are standardized to easily switch between models. - -- incorporate a subjective selection of promising tools for fine-tuning/investigating these models: - - - a simple/consistent way to add new tokens to the vocabulary and embeddings for fine-tuning, - - simple ways to mask and prune transformer heads. - -## Main concepts - -The library is build around three types of classes for each model: - -- **model classes** e.g., `BertModel` which are 20+ PyTorch models (`torch.nn.Modules`) that work with the pretrained weights provided in the library. In TF2, these are `tf.keras.Model`. -- **configuration classes** which store all the parameters required to build a model, e.g., `BertConfig`. You don't always need to instantiate these your-self. In particular, if you are using a pretrained model without any modification, creating the model will automatically take care of instantiating the configuration (which is part of the model) -- **tokenizer classes** which store the vocabulary for each model and provide methods for encoding/decoding strings in a list of token embeddings indices to be fed to a model, e.g., `BertTokenizer` - -All these classes can be instantiated from pretrained instances and saved locally using two methods: - -- `from_pretrained()` let you instantiate a model/configuration/tokenizer from a pretrained version either provided by the library itself (currently 27 models are provided as listed [here](https://huggingface.co/transformers/pretrained_models.html)) or stored locally (or on a server) by the user, -- `save_pretrained()` let you save a model/configuration/tokenizer locally so that it can be reloaded using `from_pretrained()`. - -We'll finish this quickstart tour by going through a few simple quick-start examples to see how we can instantiate and use these classes. The rest of the documentation is organized into two parts: - -- the **MAIN CLASSES** section details the common functionalities/method/attributes of the three main type of classes (configuration, model, tokenizer) plus some optimization related classes provided as utilities for training, -- the **PACKAGE REFERENCE** section details all the variants of each class for each model architectures and, in particular, the input/output that you should expect when calling each of them. - -## Quick tour: Usage - -Here are two examples showcasing a few `Bert` and `GPT2` classes and pre-trained models. - -See the full API reference for examples of each model class. - -### BERT example - -Let's start by preparing a tokenized input (a list of token embeddings indices to be fed to Bert) from a text string using `BertTokenizer` - -```python -import torch -from transformers import BertTokenizer, BertModel, BertForMaskedLM - -# OPTIONAL: if you want to have more information on what's happening under the hood, activate the logger as follows -import logging -logging.basicConfig(level=logging.INFO) - -# Load pre-trained model tokenizer (vocabulary) -tokenizer = BertTokenizer.from_pretrained('bert-base-uncased') - -# Tokenize input -text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]" -tokenized_text = tokenizer.tokenize(text) - -# Mask a token that we will try to predict back with `BertForMaskedLM` -masked_index = 8 -tokenized_text[masked_index] = '[MASK]' -assert tokenized_text == ['[CLS]', 'who', 'was', 'jim', 'henson', '?', '[SEP]', 'jim', '[MASK]', 'was', 'a', 'puppet', '##eer', '[SEP]'] - -# Convert token to vocabulary indices -indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text) -# Define sentence A and B indices associated to 1st and 2nd sentences (see paper) -segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1] - -# Convert inputs to PyTorch tensors -tokens_tensor = torch.tensor([indexed_tokens]) -segments_tensors = torch.tensor([segments_ids]) -``` - -Let's see how we can use `BertModel` to encode our inputs in hidden-states: - -```python -# Load pre-trained model (weights) -model = BertModel.from_pretrained('bert-base-uncased') - -# Set the model in evaluation mode to deactivate the DropOut modules -# This is IMPORTANT to have reproducible results during evaluation! -model.eval() - -# If you have a GPU, put everything on cuda -tokens_tensor = tokens_tensor.to('cuda') -segments_tensors = segments_tensors.to('cuda') -model.to('cuda') - -# Predict hidden states features for each layer -with torch.no_grad(): - # See the models docstrings for the detail of the inputs - outputs = model(tokens_tensor, token_type_ids=segments_tensors) - # Transformers models always output tuples. - # See the models docstrings for the detail of all the outputs - # In our case, the first element is the hidden state of the last layer of the Bert model - encoded_layers = outputs[0] -# We have encoded our input sequence in a FloatTensor of shape (batch size, sequence length, model hidden dimension) -assert tuple(encoded_layers.shape) == (1, len(indexed_tokens), model.config.hidden_size) -``` - -And how to use `BertForMaskedLM` to predict a masked token: - -```python -# Load pre-trained model (weights) -model = BertForMaskedLM.from_pretrained('bert-base-uncased') -model.eval() - -# If you have a GPU, put everything on cuda -tokens_tensor = tokens_tensor.to('cuda') -segments_tensors = segments_tensors.to('cuda') -model.to('cuda') - -# Predict all tokens -with torch.no_grad(): - outputs = model(tokens_tensor, token_type_ids=segments_tensors) - predictions = outputs[0] - -# confirm we were able to predict 'henson' -predicted_index = torch.argmax(predictions[0, masked_index]).item() -predicted_token = tokenizer.convert_ids_to_tokens([predicted_index])[0] -assert predicted_token == 'henson' -``` - -### OpenAI GPT-2 - -Here is a quick-start example using `GPT2Tokenizer` and `GPT2LMHeadModel` class with OpenAI's pre-trained model to predict the next token from a text prompt. - -First let's prepare a tokenized input from our text string using `GPT2Tokenizer` - -```python -import torch -from transformers import GPT2Tokenizer, GPT2LMHeadModel - -# OPTIONAL: if you want to have more information on what's happening, activate the logger as follows -import logging -logging.basicConfig(level=logging.INFO) - -# Load pre-trained model tokenizer (vocabulary) -tokenizer = GPT2Tokenizer.from_pretrained('gpt2') - -# Encode a text inputs -text = "Who was Jim Henson ? Jim Henson was a" -indexed_tokens = tokenizer.encode(text) - -# Convert indexed tokens in a PyTorch tensor -tokens_tensor = torch.tensor([indexed_tokens]) -``` - -Let's see how to use `GPT2LMHeadModel` to generate the next token following our text: - -```python -# Load pre-trained model (weights) -model = GPT2LMHeadModel.from_pretrained('gpt2') - -# Set the model in evaluation mode to deactivate the DropOut modules -# This is IMPORTANT to have reproducible results during evaluation! -model.eval() - -# If you have a GPU, put everything on cuda -tokens_tensor = tokens_tensor.to('cuda') -model.to('cuda') - -# Predict all tokens -with torch.no_grad(): - outputs = model(tokens_tensor) - predictions = outputs[0] - -# get the predicted next sub-word (in our case, the word 'man') -predicted_index = torch.argmax(predictions[0, -1, :]).item() -predicted_text = tokenizer.decode(indexed_tokens + [predicted_index]) -assert predicted_text == 'Who was Jim Henson? Jim Henson was a man' -``` - -Examples for each model class of each model architecture (Bert, GPT, GPT-2, Transformer-XL, XLNet and XLM) can be found in the [documentation](#documentation). - -#### Using the past - -GPT-2, as well as some other models (GPT, XLNet, Transfo-XL, CTRL), make use of a `past` or `mems` attribute which can be used to prevent re-computing the key/value pairs when using sequential decoding. It is useful when generating sequences as a big part of the attention mechanism benefits from previous computations. - -Here is a fully-working example using the `past` with `GPT2LMHeadModel` and argmax decoding (which should only be used as an example, as argmax decoding introduces a lot of repetition): - -```python -from transformers import GPT2LMHeadModel, GPT2Tokenizer -import torch - -tokenizer = GPT2Tokenizer.from_pretrained("gpt2") -model = GPT2LMHeadModel.from_pretrained('gpt2') - -generated = tokenizer.encode("The Manhattan bridge") -context = torch.tensor([generated]) -past = None - -for i in range(100): - print(i) - output, past = model(context, past=past) - token = torch.argmax(output[..., -1, :]) - - generated += [token.tolist()] - context = token.unsqueeze(0) - -sequence = tokenizer.decode(generated) - -print(sequence) -``` - -The model only requires a single token as input as all the previous tokens' key/value pairs are contained in the `past`. diff --git a/docs/source/quicktour.rst b/docs/source/quicktour.rst new file mode 100644 index 0000000000..c154265314 --- /dev/null +++ b/docs/source/quicktour.rst @@ -0,0 +1,379 @@ +Quick tour +========== + +Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for +Natural Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG), +such as completing a prompt with new text or translating in another language. + +First we will see how to easily leverage the pipeline API to quickly use those pretrained models at inference. Then, we +will dig a little bit more and see how the library gives you access to those models and helps you preprocess your data. + +.. note:: + + All code examples presented in the documentation have a switch on the top left for Pytorch versus TensorFlow. If + not, the code is expected to work for both backends without any change needed. + +Getting started on a task with a pipeline +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The easiest way to use a pretrained model on a given task is to use :func:`~transformers.pipeline`. 🤗 Transformers +provides the following tasks out of the box: + +- Sentiment analysis: is a text positive or negative? +- Text generation (in English): provide a prompt and the model will generate what follows. +- Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place, + etc.) +- Question answering: provide the model with some context and a question, extract the answer from the context. +- Filling masked text: given a text with masked words (e.g., replaced by ``[MASK]``), fill the blanks. +- Summarization: generate a summary of a long text. +- Translation: translate a text in another language. +- Feature extraction: return a tensor representation of the text. + +Let's see how this work for sentiment analysis (the other tasks are all covered in the +:doc:`task summary `): + +:: + + from transformers import pipeline + classifier = pipeline('sentiment-analysis') + +When typing this command for the first time, a pretrained model and its tokenizer are downloaded and cached. We will +look at both later on, but as an introduction the tokenizer's job is to preprocess the text for the model, which is +then responsible for making predictions. The pipeline groups all of that together, and post-process the predictions to +make them readable. For instance + +:: + + classifier('We are very happy to show you the Transformers library.') + +will return something like this: + +:: + + [{'label': 'POSITIVE', 'score': 0.999799370765686}] + +That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model as a +`batch`: + +:: + + classifier(["We are very happy to show you the Transformers library.", + "We hope you don't hate it."]) + +returning a list of dictionaries like this one: + +:: + + [{'label': 'POSITIVE', 'score': 0.999799370765686}, + {'label': 'NEGATIVE', 'score': 0.5308589935302734}] + +You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is +fairly neutral. + +By default, the model downloaded for this pipeline is called "distilbert-base-uncased-finetuned-sst-2-english". We can +look at its `model page `__ to get more +information about it. It uses the :doc:`DistilBERT architecture ` and has been fine-tuned on a +dataset called SST-2 for the sentiment analysis task. + +Let's say we want to use another model; for instance, one that has been trained on French data. We can search through +the `model hub `__ that gathers models pretrained on a lot of data by research labs, but +also community models (usually fine-tuned versions of those big models on a specific dataset). Applying the tags +"French" and "text-classification" gives back a suggestion "nlptown/bert-base-multilingual-uncased-sentiment". Let's +see how we can use it. + +You can directly pass the name of the model to use to :func:`~transformers.pipeline`: + +:: + + classifier = pipeline('sentiment-analysis', model="nlptown/bert-base-multilingual-uncased-sentiment") + +This classifier can now deal with texts in English, French, but also Dutch, German, Italian and Spanish! You can also +replace that name by a local folder where you have saved a pretrained model (see below). You can also pass a model +object and its associated tokenizer. + +We will need two classes for this. The first is :class:`~transformers.AutoTokenizer`, which we will use to download the +tokenizer associated to the model we picked and instantiate it. The second is +:class:`~transformers.AutoModelForSequenceClassification` (or +:class:`~transformers.TFAutoModelForSequenceClassification` if you are using TensorFlow), which we will use to download +the model itself. Note that if we were using the library on an other task, the class of the model would change. The +:doc:`task summary ` tutorial summarizes which class is used for which task. + +:: + + ## PYTORCH CODE + from transformers import AutoTokenizer, AutoModelForSequenceClassification + ## TENSORFLOW CODE + from transformers import AutoTokenizer, TFAutoModelForSequenceClassification + +Now, to download the models and tokenizer we found previously, we just have to use the +:func:`~transformers.AutoModelForSequenceClassification.from_pretrained` method (feel free to replace ``model_name`` by +any other model from the model hub): + +:: + + ## PYTORCH CODE + model_name = "nlptown/bert-base-multilingual-uncased-sentiment" + model = AutoModelForSequenceClassification.from_pretrained(model_name) + tokenizer = AutoTokenizer.from_pretrained(model_name) + pipe = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer) + ## TENSORFLOW CODE + model_name = "nlptown/bert-base-multilingual-uncased-sentiment" + model = TFAutoModelForSequenceClassification.from_pretrained(model_name) + tokenizer = AutoTokenizer.from_pretrained(model_name) + classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer) + +If you don't find a model that has been pretrained on some data similar to yours, you will need to fine-tune a +pretrained model on your data. We provide :doc:`example scripts ` to do so. Once you're done, don't forget +to share your fine-tuned model on the hub with the community, using :doc:`this tutorial `. + +.. _pretrained-model: + +Under the hood: pretrained models +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Let's now see what happens beneath the hood when using those pipelines. As we saw, the model and tokenizer are created +using the :obj:`from_pretrained` method: + +:: + + ## PYTORCH CODE + from transformers import AutoTokenizer, AutoModelForSequenceClassification + model_name = "distilbert-base-uncased-finetuned-sst-2-english" + model = AutoModelForSequenceClassification.from_pretrained(model_name) + tokenizer = AutoTokenizer.from_pretrained(model_name) + ## TENSORFLOW CODE + from transformers import AutoTokenizer, TFAutoModelForSequenceClassification + model_name = "distilbert-base-uncased-finetuned-sst-2-english" + model = TFAutoModelForSequenceClassification.from_pretrained(model_name) + tokenizer = AutoTokenizer.from_pretrained(model_name) + +Using the tokenizer +^^^^^^^^^^^^^^^^^^^ + +We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in +words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern +that process, which is why we need to instantiate the tokenizer using the name of the model, to make sure we use the +same rules as when the model was pretrained. + +The second step is to convert those `tokens` into numbers, to be able to build a tensor out of them and feed them to +the model. To do this, the tokenizer has a `vocab`, which is the part we download when we instantiate it with the +:obj:`from_pretrained` method, since we need to use the same `vocab` as when the model was pretrained. + +To apply these steps on a given text, we can just feed it to our tokenizer: + +:: + + input = tokenizer("We are very happy to show you the Transformers library.") + print(input) + +This returns a dictionary string to list of ints. It contains the `ids of the tokens `__, +as mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an +`attention mask `__ that the model will use to have a better understanding of the sequence: + + +:: + + {'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 19081, 3075, 1012, 102], + 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]} + +You can pass a list of sentences directly to your tokenizer. If your goal is to send them through your model as a +batch, you probably want to pad them all to the same length, truncate them to the maximum length the model can accept +and get tensors back. You can specify all of that to the tokenizer: + +:: + + ## PYTORCH CODE + batch = tokenizer( + ["We are very happy to show you the Transformers library.", + "We hope you don't hate it."], + padding=True, truncation=True, return_tensors="pt") + print(batch) + ## TENSORFLOW CODE + batch = tokenizer( + ["We are very happy to show you the Transformers library.", + "We hope you don't hate it."], + padding=True, truncation=True, return_tensors="tf") + print(batch) + +The padding is automatically applied on the side the model expect it (in this case, on the right), with the +padding token the model was pretrained with. The attention mask is also adapted to take the padding into account: + +:: + + {'input_ids': tensor([[ 101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 19081, 3075, 1012, 102], + [ 101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0]]), + 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], + [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]])} + +You can learn more about tokenizers on their :doc:`doc page ` (tutorial coming soon). + +Using the model +^^^^^^^^^^^^^^^ + +Once your input has been preprocessed by the tokenizer, you can directly send it to the model. As we mentioned, it will +contain all the relevant information the model needs. If you're using a TensorFlow model, you can directly pass the +dictionary keys to tensor, for a PyTorch model, you need to unpack the dictionary by adding :obj:`**`. + +:: + + ## PYTORCH CODE + outputs = model(**batch) + ## TENSORFLOW CODE + outputs = model(batch) + +In 🤗 Transformers, all outputs are tuples (with only one element potentially). Here, we get a tuple with just the +final activations of the model. + +:: + + (tensor([[-4.1329, 4.3811], + [ 0.0818, -0.0418]]),) + +.. note:: + + All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final + activation function (like SoftMax) since this final activation function is often fused with the loss. + +Let's apply the SoftMax activation to get predictions. + +:: + + ## PYTORCH CODE + import torch.nn.functional as F + predictions = F.softmax(outputs[0], dim=-1) + print(predictions) + ## TENSORFLOW CODE + predictions = tf.nn.softmax(outputs[0], axis=-1) + print(predictions) + +We can see we get the numbers from before: + +:: + + tensor([[2.0060e-04, 9.9980e-01], + [5.3086e-01, 4.6914e-01]]) + +If you have labels, you can provide them to the model, it will return a tuple with the loss and the final activations. + +:: + + ## PYTORCH CODE + import torch + outputs = model(**batch, labels = torch.tensor([1, 0]) + ## TENSORFLOW CODE + import tensorflow as tf + outputs = model(batch, labels = tf.constant([1, 0]) + +Models are standard `torch.nn.Module `__ or +`tf.keras.Model `__ so you can use them in your usual +training loop. 🤗 Transformers also provides a :class:`~transformers.Trainer` (or :class:`~transformers.TFTrainer` if +you are using TensorFlow) class to help with your training (taking care of things such as distributed training, mixed +precision, etc.). See the training tutorial (coming soon) for more details. + +Once your model is fine-tuned, you can save it with its tokenizer the following way: + +:: + + tokenizer.save_pretrained(save_directory) + model.save_pretrained(save_directory) + +You can then load this model back using the :func:`~transformers.AutoModel.from_pretrained` method by passing the +directory name instead of the model name. One cool feature of 🤗 Transformers is that you can easily switch between +PyTorch and TensorFlow: any model saved as before can be loaded back either in PyTorch or TensorFlow. If you are +loading a saved PyTorch model in a TensorFlow model, use :func:`~transformers.TFAutoModel.from_pretrained` like this: + +:: + + tokenizer = AutoTokenizer.from_pretrained(save_directory) + model = TFAutoModel.from_pretrained(save_directory, from_pt=True) + +and if you are loading a saved TensorFlow model in a PyTorch model, you should use the following code: + +:: + + tokenizer = AutoTokenizer.from_pretrained(save_directory) + model = AutoModel.from_pretrained(save_directory, from_tf=True) + +Lastly, you can also ask the model to return all hidden states and all attention weights if you need them: + + +:: + + ## PYTORCH CODE + outputs = model(**batch, output_hidden_states=True, output_attentions=True) + all_hidden_states, all_attentions = outputs[-2:] + ## TENSORFLOW CODE + outputs = model(batch, output_hidden_states=True, output_attentions=True) + all_hidden_states, all_attentions = outputs[-2:] + +Accessing the code +^^^^^^^^^^^^^^^^^^ + +The :obj:`AutoModel` and :obj:`AutoTokenizer` classes are just shortcuts that will automatically work with any +pretrained model. Behind the scenes, the library has one model class per combination of architecture plus class, so the +code is easy to access and tweak if you need to. + +In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's +using the :doc:`DistilBERT ` architecture. The model automatically created is then a +:class:`~transformers.DistilBertForSequenceClassification`. You can look at its documentation for all details relevant +to that specific model, or browse the source code. This is how you would directly instantiate model and tokenizer +without the auto magic: + +:: + + ## PYTORCH CODE + from transformers import DistilBertTokenizer, DistilBertForSequenceClassification + model_name = "distilbert-base-uncased-finetuned-sst-2-english" + model = DistilBertForSequenceClassification.from_pretrained(model_name) + tokenizer = DistilBertTokenizer.from_pretrained(model_name) + ## TENSORFLOW CODE + from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification + model_name = "distilbert-base-uncased-finetuned-sst-2-english" + model = TFDistilBertForSequenceClassification.from_pretrained(model_name) + tokenizer = DistilBertTokenizer.from_pretrained(model_name) + +Customizing the model +^^^^^^^^^^^^^^^^^^^^^ + +If you want to change how the model itself is built, you can define your custom configuration class. Each architecture +comes with its own relevant configuration (in the case of DistilBERT, :class:`~transformers.DistilBertConfig`) which +allows you to specify any of the hidden dimension, dropout rate etc. If you do core modifications, like changing the +hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would then +instantiate the model directly from this configuration. + +Here we use the predefined vocabulary of DistilBERT (hence load the tokenizer with the +:func:`~transformers.DistilBertTokenizer.from_pretrained` method) and initialize the model from scratch (hence +instantiate the model from the configuration instead of using the +:func:`~transformers.DistilBertForSequenceClassification.from_pretrained` method). + +:: + + ## PYTORCH CODE + from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification + config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512) + tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased') + model = DistilBertForSequenceClassification(config) + ## TENSORFLOW CODE + from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification + config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4*512) + tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-uncased') + model = TFDistilBertForSequenceClassification(config) + +For something that only changes the head of the model (for instance, the number of labels), you can still use a +pretrained model for the body. For instance, let's define a classifier for 10 different labels using a pretrained body. +We could create a configuration with all the default values and just change the number of labels, but more easily, you +can directly pass any argument a configuration would take to the :func:`from_pretrained` method and it will update the +default configuration with it: + +:: + + ## PYTORCH CODE + from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification + model_name = "distilbert-base-uncased" + model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10) + tokenizer = DistilBertTokenizer.from_pretrained(model_name) + ## TENSORFLOW CODE + from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification + model_name = "distilbert-base-uncased" + model = TFDistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10) + tokenizer = DistilBertTokenizer.from_pretrained(model_name) diff --git a/docs/source/usage.rst b/docs/source/task_summary.rst similarity index 99% rename from docs/source/usage.rst rename to docs/source/task_summary.rst index 5d035c4ab7..a7ef4d4572 100644 --- a/docs/source/usage.rst +++ b/docs/source/task_summary.rst @@ -1,4 +1,4 @@ -Usage +Summary of the tasks ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This page shows the most frequent use-cases when using the library. The models available allow for many different