Get started docs (#15098)

* clean commit of changes * apply review feedback, make edits * fix backticks, minor formatting * 🖍 make fixup and minor edits * 🖍 fix # in header * 📝 update code sample without from_pt * 📝 final review
2022-01-28 19:01:37 -06:00
parent cabd6d26a2
commit 16d4acbfdb
3 changed files with 286 additions and 410 deletions
--- a/docs/source/quicktour.mdx
+++ b/docs/source/quicktour.mdx
@@ -1,4 +1,4 @@
-<!--Copyright 2020 The HuggingFace Team. All rights reserved.
+<!--Copyright 2022 The HuggingFace Team. All rights reserved.

 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
 the License. You may obtain a copy of the License at
@@ -14,40 +14,53 @@ specific language governing permissions and limitations under the License.

 [[open-in-colab]]

-Let's have a quick look at the 🤗 Transformers library features. The library downloads pretrained models for Natural
-Language Understanding (NLU) tasks, such as analyzing the sentiment of a text, and Natural Language Generation (NLG),
-such as completing a prompt with new text or translating in another language.
-
-First we will see how to easily leverage the pipeline API to quickly use those pretrained models at inference. Then, we
-will dig a little bit more and see how the library gives you access to those models and helps you preprocess your data.
+Get up and running with 🤗 Transformers! Start using the [`pipeline`] for rapid inference, and quickly load a pretrained model and tokenizer with an [AutoClass](./model_doc/auto) to solve your text, vision or audio task.

 <Tip>

-All code examples presented in the documentation have a switch on the top left for Pytorch versus TensorFlow. If
-not, the code is expected to work for both backends without any change needed.
+All code examples presented in the documentation have a toggle on the top left for PyTorch and TensorFlow. If
+not, the code is expected to work for both backends without any change.

 </Tip>

-## Getting started on a task with a pipeline
+## Pipeline

-The easiest way to use a pretrained model on a given task is to use [`~transformers.pipeline`].
+[`pipeline`] is the easiest way to use a pretrained model for a given task.

 <Youtube id="tiZFewofSLM"/>

-🤗 Transformers provides the following tasks out of the box:
+The [`pipeline`] supports many common tasks out-of-the-box:

- Sentiment analysis: is a text positive or negative?
- Text generation (in English): provide a prompt and the model will generate what follows.
- Name entity recognition (NER): in an input sentence, label each word with the entity it represents (person, place, etc.)
- Question answering: provide the model with some context and a question, extract the answer from the context.
- Filling masked text: given a text with masked words (e.g., replaced by `[MASK]`), fill the blanks.
- Summarization: generate a summary of a long text.
- Translation: translate a text in another language.
- Feature extraction: return a tensor representation of the text.
+**Text**:
+* Sentiment analysis: classify the polarity of a given text.
+* Text generation (in English): generate text from a given input.
+* Name entity recognition (NER): label each word with the entity it represents (person, date, location, etc.).
+* Question answering: extract the answer from the context, given some context and a question.
+* Fill-mask: fill in the blank given a text with masked words.
+* Summarization: generate a summary of a long sequence of text or document.
+* Translation: translate text into another language.
+* Feature extraction: create a tensor representation of the text.

-Let's see how this work for sentiment analysis (the other tasks are all covered in the [task summary](task_summary)):
+**Image**:
+* Image classification: classify an image.
+* Image segmentation: classify every pixel in an image.
+* Object detection: detect objects within an image.

-Install the following dependencies (if not already installed):
+**Audio**:
+* Audio classification: assign a label to a given segment of audio.
+* Automatic speech recognition (ASR): transcribe audio data into text.
+
+<Tip>
+
+For more details about the [`pipeline`] and associated tasks, refer to the documentation [here](./main_classes/pipelines).
+
+</Tip>
+
+### Pipeline usage
+
+In the following example, you will use the [`pipeline`] for sentiment analysis.
+
+Install the following dependencies if you haven't already:

 ```bash
 pip install torch
@@ -55,25 +68,22 @@ pip install torch
 pip install tensorflow
 ```

+Import [`pipeline`] and specify the task you want to complete:
+
 ```py
 >>> from transformers import pipeline

 >>> classifier = pipeline("sentiment-analysis")
 ```

-When typing this command for the first time, a pretrained model and its tokenizer are downloaded and cached. We will
-look at both later on, but as an introduction the tokenizer's job is to preprocess the text for the model, which is
-then responsible for making predictions. The pipeline groups all of that together, and post-process the predictions to
-make them readable. For instance:
-
+The pipeline downloads and caches a default [pretrained model](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) and tokenizer for sentiment analysis. Now you can use the `classifier` on your target text:

 ```py
 >>> classifier("We are very happy to show you the 🤗 Transformers library.")
-[{'label': 'POSITIVE', 'score': 0.9998}]
+[{"label": "POSITIVE", "score": 0.9998}]
 ```

-That's encouraging! You can use it on a list of sentences, which will be preprocessed then fed to the model, returning
-a list of dictionaries like this one:
+For more than one sentence, pass a list of sentences to the [`pipeline`] which returns a list of dictionaries:

 ```py
 >>> results = classifier(["We are very happy to show you the 🤗 Transformers library.", "We hope you don't hate it."])
@@ -83,117 +93,109 @@ label: POSITIVE, with score: 0.9998
 label: NEGATIVE, with score: 0.5309
 ```

-To use with a large dataset, look at [iterating over a pipeline](main_classes/pipelines)
+The [`pipeline`] can also iterate over an entire dataset. Start by installing the [🤗 Datasets](https://huggingface.co/docs/datasets/) library:

-You can see the second sentence has been classified as negative (it needs to be positive or negative) but its score is
-fairly neutral.
-
-By default, the model downloaded for this pipeline is called "distilbert-base-uncased-finetuned-sst-2-english". We can
-look at its [model page](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) to get more
-information about it. It uses the [DistilBERT architecture](model_doc/distilbert) and has been fine-tuned on a
-dataset called SST-2 for the sentiment analysis task.
-
-Let's say we want to use another model; for instance, one that has been trained on French data. We can search through
-the [model hub](https://huggingface.co/models) that gathers models pretrained on a lot of data by research labs, but
-also community models (usually fine-tuned versions of those big models on a specific dataset). Applying the tags
-"French" and "text-classification" gives back a suggestion "nlptown/bert-base-multilingual-uncased-sentiment". Let's
-see how we can use it.
-
-You can directly pass the name of the model to use to [`pipeline`]:
-
-```py
->>> classifier = pipeline("sentiment-analysis", model="nlptown/bert-base-multilingual-uncased-sentiment")
+```bash
+pip install datasets 
 ```

-This classifier can now deal with texts in English, French, but also Dutch, German, Italian and Spanish! You can also
-replace that name by a local folder where you have saved a pretrained model (see below). You can also pass a model
-object and its associated tokenizer.
+Create a [`pipeline`] with the task you want to solve for and the model you want to use. Set the `device` parameter to `0` to place the tensors on a CUDA device:

-We will need two classes for this. The first is [`AutoTokenizer`], which we will use to download the tokenizer associated to the model we picked and instantiate it. The second is [`AutoModelForSequenceClassification`] (or [`TFAutoModelForSequenceClassification`] if you are using TensorFlow), which we will use to download the model itself. Note that if we were using the library on an other task, the class of the model would change. The [task summary](task_summary) tutorial summarizes which class is used for which task.
+```py
+>>> from transformers import pipeline
+
+>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
+```
+
+Next, load a dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [SUPERB](https://huggingface.co/datasets/superb) dataset:
+
+```py
+>>> import datasets
+
+>>> dataset = datasets.load_dataset("superb", name="asr", split="test")
+```
+
+Now you can iterate over the dataset with the pipeline. `KeyDataset` retrieves the item in the dictionary returned by the dataset:
+
+```py
+>>> from transformers.pipelines.base import KeyDataset
+>>> from tqdm.auto import tqdm
+
+>>> for out in tqdm(speech_recognizer(KeyDataset(dataset, "file"))):
+...     print(out)
+{"text": "HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOWER FAT AND SAUCE"}
+```
+
+### Use another model and tokenizer in the pipeline
+
+The [`pipeline`] can accommodate any model from the [Model Hub](https://huggingface.co/models), making it easy to adapt the [`pipeline`] for other use-cases. For example, if you'd like a model capable of handling French text, use the tags on the Model Hub to filter for an appropriate model. The top filtered result returns a multilingual [BERT model](https://huggingface.co/nlptown/bert-base-multilingual-uncased-sentiment) fine-tuned for sentiment analysis. Great, let's use this model!
+
+```py
+>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
+```
+
+Use the [`AutoModelForSequenceClassification`] and ['AutoTokenizer'] to load the pretrained model and it's associated tokenizer (more on an `AutoClass` below):

 ```py
 >>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
-===PT-TF-SPLIT===
->>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
-```

-Now, to download the models and tokenizer we found previously, we just have to use the
-[`~transformers.AutoModelForSequenceClassification.from_pretrained`] method (feel free to replace `model_name` by
-any other model from the model hub):
-
-```py
->>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
 >>> model = AutoModelForSequenceClassification.from_pretrained(model_name)
 >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
->>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
 ===PT-TF-SPLIT===
->>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
->>> # This model only exists in PyTorch, so we use the _from_pt_ flag to import that model in TensorFlow.
->>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name, from_pt=True)
+>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
+
+>>> model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
 >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
->>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
 ```

-If you don't find a model that has been pretrained on some data similar to yours, you will need to fine-tune a
-pretrained model on your data. We provide [example scripts](examples) to do so. Once you're done, don't forget
-to share your fine-tuned model on the hub with the community, using [this tutorial](model_sharing).
+Then you can specify the model and tokenizer in the [`pipeline`], and apply the `classifier` on your target text:

-<a id='pretrained-model'></a>
+```py
+>>> classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)
+>>> classifier("Nous sommes très heureux de vous présenter la bibliothèque 🤗 Transformers.")
+[{"label": "5 stars", "score": 0.7272651791572571}]
+```

-## Under the hood: pretrained models
+If you can't find a model for your use-case, you will need to fine-tune a pretrained model on your data. Take a look at our [fine-tuning tutorial](./training) to learn how. Finally, after you've fine-tuned your pretrained model, please consider sharing it (see tutorial [here](./model_sharing)) with the community on the Model Hub to democratize NLP for everyone! 🤗

-Let's now see what happens beneath the hood when using those pipelines.
+## AutoClass

 <Youtube id="AhChOFRegn4"/>

-As we saw, the model and tokenizer are created using the `from_pretrained` method:
+Under the hood, the [`AutoModelForSequenceClassification`] and [`AutoTokenizer`] classes work together to power the [`pipeline`]. An [AutoClass](./model_doc/auto) is a shortcut that automatically retrieves the architecture of a pretrained model from it's name or path. You only need to select the appropriate `AutoClass` for your task and it's associated tokenizer with [`AutoTokenizer`]. 
+
+Let's return to our example and see how you can use the `AutoClass` to replicate the results of the [`pipeline`].
+
+### AutoTokenizer
+
+A tokenizer is responsible for preprocessing text into a format that is understandable to the model. First, the tokenizer will split the text into words called *tokens*. There are multiple rules that govern the tokenization process, including how to split a word and at what level (learn more about tokenization [here](./tokenizer_summary)). The most important thing to remember though is you need to instantiate the tokenizer with the same model name to ensure you're using the same tokenization rules a model was pretrained with.
+
+Load a tokenizer with [`AutoTokenizer`]:

 ```py
->>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
+>>> from transformers import AutoTokenizer

->>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
->>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
->>> tokenizer = AutoTokenizer.from_pretrained(model_name)
-===PT-TF-SPLIT===
->>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
-
->>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
->>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
+>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
 >>> tokenizer = AutoTokenizer.from_pretrained(model_name)
 ```

-### Using the tokenizer
+Next, the tokenizer converts the tokens into numbers in order to construct a tensor as input to the model. This is known as the model's *vocabulary*.

-We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in
-words (or part of words, punctuation symbols, etc.) usually called _tokens_. There are multiple rules that can govern
-that process (you can learn more about them in the [tokenizer summary](tokenizer_summary)), which is why we need
-to instantiate the tokenizer using the name of the model, to make sure we use the same rules as when the model was
-pretrained.
-
-The second step is to convert those _tokens_ into numbers, to be able to build a tensor out of them and feed them to
-the model. To do this, the tokenizer has a _vocab_, which is the part we download when we instantiate it with the
-`from_pretrained` method, since we need to use the same _vocab_ as when the model was pretrained.
-
-To apply these steps on a given text, we can just feed it to our tokenizer:
+Pass your text to the tokenizer:

 ```py
->>> inputs = tokenizer("We are very happy to show you the 🤗 Transformers library.")
+>>> encoding = tokenizer("We are very happy to show you the 🤗 Transformers library.")
+>>> print(encoding)
+{"input_ids": [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102],
+ "attention_mask": [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
 ```

-This returns a dictionary string to list of ints. It contains the [ids of the tokens](glossary#input-ids), as
-mentioned before, but also additional arguments that will be useful to the model. Here for instance, we also have an
-[attention mask](glossary#attention-mask) that the model will use to have a better understanding of the sequence:
+The tokenizer will return a dictionary containing:

+* [input_ids](./glossary#input-ids): numerical representions of your tokens.
+* [atttention_mask](.glossary#attention-mask): indicates which tokens should be attended to.

-```py
->>> print(inputs)
-{'input_ids': [101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102],
- 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
-```
-
-You can pass a list of sentences directly to your tokenizer. If your goal is to send them through your model as a
-batch, you probably want to pad them all to the same length, truncate them to the maximum length the model can accept
-and get tensors back. You can specify all of that to the tokenizer:
+Just like the [`pipeline`], the tokenizer will accept a list of inputs. In addition, the tokenizer can also pad and truncate the text to return a batch with uniform length:

 ```py
 >>> pt_batch = tokenizer(
@@ -213,28 +215,31 @@ and get tensors back. You can specify all of that to the tokenizer:
 ... )
 ```

-The padding is automatically applied on the side expected by the model (in this case, on the right), with the padding
-token the model was pretrained with. The attention mask is also adapted to take the padding into account:
+Read the [preprocessing](./preprocessing) tutorial for more details about tokenization.
+
+### AutoModel
+
+🤗 Transformers provides a simple and unified way to load pretrained instances. This means you can load an [`AutoModel`] like you would load an [`AutoTokenizer`]. The only difference is selecting the correct [`AutoModel`] for the task. Since you are doing text - or sequence - classification, load [`AutoModelForSequenceClassification`]. The TensorFlow equivalent is simply [`TFAutoModelForSequenceClassification`]:

 ```py
->>> for key, value in pt_batch.items():
-...     print(f"{key}: {value.numpy().tolist()}")
-input_ids: [[101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], [101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0, 0]]
-attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]
+>>> from transformers import AutoModelForSequenceClassification
+
+>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
+>>> pt_model = AutoModelForSequenceClassification.from_pretrained(model_name)
 ===PT-TF-SPLIT===
->>> for key, value in tf_batch.items():
-...     print(f"{key}: {value.numpy().tolist()}")
-input_ids: [[101, 2057, 2024, 2200, 3407, 2000, 2265, 2017, 1996, 100, 19081, 3075, 1012, 102], [101, 2057, 3246, 2017, 2123, 1005, 1056, 5223, 2009, 1012, 102, 0, 0, 0]]
-attention_mask: [[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]]
+>>> from transformers import TFAutoModelForSequenceClassification
+
+>>> model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
+>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
 ```

-You can learn more about tokenizers [here](preprocessing).
+<Tip>

-### Using the model
+See the [task summary](./task_summary) for which [`AutoModel`] class to use for which task.

-Once your input has been preprocessed by the tokenizer, you can send it directly to the model. As we mentioned, it will
-contain all the relevant information the model needs. If you're using a TensorFlow model, you can pass the dictionary
-keys directly to tensors, for a PyTorch model, you need to unpack the dictionary by adding `**`.
+</Tip>
+
+Now you can pass your preprocessed batch of inputs directly to the model. If you are using a PyTorch model, unpack the dictionary by adding `**`. For TensorFlow models, pass the dictionary keys directly to the tensors:

 ```py
 >>> pt_outputs = pt_model(**pt_batch)
@@ -242,85 +247,44 @@ keys directly to tensors, for a PyTorch model, you need to unpack the dictionary
 >>> tf_outputs = tf_model(tf_batch)
 ```

-In 🤗 Transformers, all outputs are objects that contain the model's final activations along with other metadata. These
-objects are described in greater detail [here](main_classes/output). For now, let's inspect the output ourselves:
-
-```py
->>> print(pt_outputs)
-SequenceClassifierOutput(loss=None, logits=tensor([[-4.0833,  4.3364],
-        [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
-===PT-TF-SPLIT===
->>> print(tf_outputs)
-TFSequenceClassifierOutput(loss=None, logits=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
-array([[-4.0833 ,  4.3364  ],
-       [ 0.0818, -0.0418]], dtype=float32)>, hidden_states=None, attentions=None)
-```
-
-Notice how the output object has a `logits` attribute. You can use this to access the model's final activations.
-
-<Tip>
-
-All 🤗 Transformers models (PyTorch or TensorFlow) return the activations of the model *before* the final activation
-function (like SoftMax) since this final activation function is often fused with the loss.
-
-</Tip>
-
-Let's apply the SoftMax activation to get predictions.
+The model outputs the final activations in the `logits` attribute. Apply the softmax function to the `logits` to retrieve the probabilities:

 ```py
 >>> from torch import nn

 >>> pt_predictions = nn.functional.softmax(pt_outputs.logits, dim=-1)
-===PT-TF-SPLIT===
->>> import tensorflow as tf
-
->>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
-```
-
-We can see we get the numbers from before:
-
-```py
 >>> print(pt_predictions)
 tensor([[2.2043e-04, 9.9978e-01],
        [5.3086e-01, 4.6914e-01]], grad_fn=<SoftmaxBackward>)
 ===PT-TF-SPLIT===
+>>> import tensorflow as tf
+
+>>> tf_predictions = tf.nn.softmax(tf_outputs.logits, axis=-1)
 >>> print(tf_predictions)
 tf.Tensor(
 [[2.2043e-04 9.9978e-01]
 [5.3086e-01 4.6914e-01]], shape=(2, 2), dtype=float32)
 ```

-If you provide the model with labels in addition to inputs, the model output object will also contain a `loss`
-attribute:
-
-```py
->>> import torch
-
->>> pt_outputs = pt_model(**pt_batch, labels=torch.tensor([1, 0]))
->>> print(pt_outputs)
-SequenceClassifierOutput(loss=tensor(0.3167, grad_fn=<NllLossBackward>), logits=tensor([[-4.0833,  4.3364],
-        [ 0.0818, -0.0418]], grad_fn=<AddmmBackward>), hidden_states=None, attentions=None)
-===PT-TF-SPLIT===
->>> import tensorflow as tf
-
->>> tf_outputs = tf_model(tf_batch, labels=tf.constant([1, 0]))
->>> print(tf_outputs)
-TFSequenceClassifierOutput(loss=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([2.2051e-04, 6.3326e-01], dtype=float32)>, logits=<tf.Tensor: shape=(2, 2), dtype=float32, numpy=
-array([[-4.0833 ,  4.3364  ],
-       [ 0.0818, -0.0418]], dtype=float32)>, hidden_states=None, attentions=None)
-```
-
-Models are standard [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or [tf.keras.Model](https://www.tensorflow.org/api_docs/python/tf/keras/Model) so you can use them in your usual training loop. 🤗 Transformers also provides a [`Trainer`] class to help with your training in PyTorch (taking care of things such as distributed training, mixed precision, etc.) whereas you can leverage the `fit()` method in Keras. See the [training tutorial](training) for more details.
-
 <Tip>

-Pytorch model outputs are special dataclasses so that you can get autocompletion for their attributes in an IDE.
-They also behave like a tuple or a dictionary (e.g., you can index with an integer, a slice or a string) in which
-case the attributes not set (that have `None` values) are ignored.
+All 🤗 Transformers models (PyTorch or TensorFlow) outputs the tensors *before* the final activation
+function (like softmax) because the final activation function is often fused with the loss.

 </Tip>

-Once your model is fine-tuned, you can save it with its tokenizer in the following way:
+Models are a standard [`torch.nn.Module`](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) or a [`tf.keras.Model`](https://www.tensorflow.org/api_docs/python/tf/keras/Model) so you can use them in your usual training loop. However, to make things easier, 🤗 Transformers provides a [`Trainer`] class for PyTorch that adds functionality for distributed training, mixed precision, and more. For TensorFlow, you can use the `fit` method from [Keras](https://keras.io/). Refer to the [training tutorial](./training) for more details.
+
+<Tip>
+
+🤗 Transformers model outputs are special dataclasses so their attributes are autocompleted in an IDE.
+The model outputs also behave like a tuple or a dictionary (e.g., you can index with an integer, a slice or a string) in which case the attributes that are `None` are ignored.
+
+</Tip>
+
+### Save a model
+
+Once your model is fine-tuned, you can save it with its tokenizer using [`PreTrainedModel.save_pretrained`]:

 ```py
 >>> pt_save_directory = "./pt_save_pretrained"
@@ -332,113 +296,24 @@ Once your model is fine-tuned, you can save it with its tokenizer in the followi
 >>> tf_model.save_pretrained(tf_save_directory)
 ```

-You can then load this model back using the [`AutoModel.from_pretrained`] method by passing the
-directory name instead of the model name. One cool feature of 🤗 Transformers is that you can easily switch between
-PyTorch and TensorFlow: any model saved as before can be loaded back either in PyTorch or TensorFlow.
+When you are ready to use the model again, reload it with [`PreTrainedModel.from_pretrained`]:

-
-If you would like to load your saved model in the other framework, first make sure it is installed:
-
-```bash
-pip install torch
+```py
+>>> pt_model = AutoModelForSequenceClassification.from_pretrained("./pt_save_pretrained")
 ===PT-TF-SPLIT===
-pip install tensorflow
+>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained("./tf_save_pretrained")
 ```

-Then, use the corresponding Auto class to load it like this:
+One particularly cool 🤗 Transformers feature is the ability to save a model and reload it as either a PyTorch or TensorFlow model. The `from_pt` or `from_tf` parameter can convert the model from one framework to the other:

 ```py
 >>> from transformers import AutoModel

 >>> tokenizer = AutoTokenizer.from_pretrained(tf_save_directory)
->>> pt_model = AutoModel.from_pretrained(tf_save_directory, from_tf=True)
+>>> pt_model = AutoModelForSequenceClassification.from_pretrained(tf_save_directory, from_tf=True)
 ===PT-TF-SPLIT===
 >>> from transformers import TFAutoModel

 >>> tokenizer = AutoTokenizer.from_pretrained(pt_save_directory)
->>> tf_model = TFAutoModel.from_pretrained(pt_save_directory, from_pt=True)
-```
-
-Lastly, you can also ask the model to return all hidden states and all attention weights if you need them:
-
-
-```py
->>> pt_outputs = pt_model(**pt_batch, output_hidden_states=True, output_attentions=True)
->>> all_hidden_states = pt_outputs.hidden_states
->>> all_attentions = pt_outputs.attentions
-===PT-TF-SPLIT===
->>> tf_outputs = tf_model(tf_batch, output_hidden_states=True, output_attentions=True)
->>> all_hidden_states = tf_outputs.hidden_states
->>> all_attentions = tf_outputs.attentions
-```
-
-### Accessing the code
-
-The [`AutoModel`] and [`AutoTokenizer`] classes are just shortcuts that will automatically work with any
-pretrained model. Behind the scenes, the library has one model class per combination of architecture plus class, so the
-code is easy to access and tweak if you need to.
-
-In our previous example, the model was called "distilbert-base-uncased-finetuned-sst-2-english", which means it's using
-the [DistilBERT](model_doc/distilbert) architecture. As [`AutoModelForSequenceClassification`] (or [`TFAutoModelForSequenceClassification`] if you are using TensorFlow) was used, the model automatically created is then a [`DistilBertForSequenceClassification`]. You can look at its documentation for all details relevant to that specific model, or browse the source code. This is how you would
-directly instantiate model and tokenizer without the auto magic:
-
-```py
->>> from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
-
->>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
->>> model = DistilBertForSequenceClassification.from_pretrained(model_name)
->>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
-===PT-TF-SPLIT===
->>> from transformers import DistilBertTokenizer, TFDistilBertForSequenceClassification
-
->>> model_name = "distilbert-base-uncased-finetuned-sst-2-english"
->>> model = TFDistilBertForSequenceClassification.from_pretrained(model_name)
->>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
-```
-
-### Customizing the model
-
-If you want to change how the model itself is built, you can define a custom configuration class. Each architecture
-comes with its own relevant configuration. For example, [`DistilBertConfig`] allows you to specify
-parameters such as the hidden dimension, dropout rate, etc for DistilBERT. If you do core modifications, like changing
-the hidden size, you won't be able to use a pretrained model anymore and will need to train from scratch. You would
-then instantiate the model directly from this configuration.
-
-Below, we load a predefined vocabulary for a tokenizer with the
-[`~DistilBertTokenizer.from_pretrained`] method. However, unlike the tokenizer, we wish to initialize
-the model from scratch. Therefore, we instantiate the model from a configuration instead of using the
-[`DistilBertForSequenceClassification.from_pretrained`] method.
-
-```py
->>> from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
-
->>> config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4 * 512)
->>> tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
->>> model = DistilBertForSequenceClassification(config)
-===PT-TF-SPLIT===
->>> from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
-
->>> config = DistilBertConfig(n_heads=8, dim=512, hidden_dim=4 * 512)
->>> tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
->>> model = TFDistilBertForSequenceClassification(config)
-```
-
-For something that only changes the head of the model (for instance, the number of labels), you can still use a
-pretrained model for the body. For instance, let's define a classifier for 10 different labels using a pretrained body.
-Instead of creating a new configuration with all the default values just to change the number of labels, we can instead
-pass any argument a configuration would take to the [`from_pretrained`] method and it will update the default
-configuration appropriately:
-
-```py
->>> from transformers import DistilBertConfig, DistilBertTokenizer, DistilBertForSequenceClassification
-
->>> model_name = "distilbert-base-uncased"
->>> model = DistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
->>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
-===PT-TF-SPLIT===
->>> from transformers import DistilBertConfig, DistilBertTokenizer, TFDistilBertForSequenceClassification
-
->>> model_name = "distilbert-base-uncased"
->>> model = TFDistilBertForSequenceClassification.from_pretrained(model_name, num_labels=10)
->>> tokenizer = DistilBertTokenizer.from_pretrained(model_name)
-```
+>>> tf_model = TFAutoModelForSequenceClassification.from_pretrained(pt_save_directory, from_pt=True)
+```