Breakup export guide (#19271)
* split onnx and torchscript docs * make style * apply reviews
This commit is contained in:
@@ -33,7 +33,9 @@
|
|||||||
- local: converting_tensorflow_models
|
- local: converting_tensorflow_models
|
||||||
title: Converting from TensorFlow checkpoints
|
title: Converting from TensorFlow checkpoints
|
||||||
- local: serialization
|
- local: serialization
|
||||||
title: Export 🤗 Transformers models
|
title: Export to ONNX
|
||||||
|
- local: torchscript
|
||||||
|
title: Export to TorchScript
|
||||||
- local: troubleshooting
|
- local: troubleshooting
|
||||||
title: Troubleshoot
|
title: Troubleshoot
|
||||||
title: General usage
|
title: General usage
|
||||||
|
|||||||
@@ -10,36 +10,36 @@ an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express o
|
|||||||
specific language governing permissions and limitations under the License.
|
specific language governing permissions and limitations under the License.
|
||||||
-->
|
-->
|
||||||
|
|
||||||
# Export 🤗 Transformers Models
|
# Export to ONNX
|
||||||
|
|
||||||
If you need to deploy 🤗 Transformers models in production environments, we
|
If you need to deploy 🤗 Transformers models in production environments, we recommend
|
||||||
recommend exporting them to a serialized format that can be loaded and executed
|
exporting them to a serialized format that can be loaded and executed on specialized
|
||||||
on specialized runtimes and hardware. In this guide, we'll show you how to
|
runtimes and hardware. In this guide, we'll show you how to export 🤗 Transformers
|
||||||
export 🤗 Transformers models in two widely used formats: ONNX and TorchScript.
|
models to [ONNX (Open Neural Network eXchange)](http://onnx.ai).
|
||||||
|
|
||||||
Once exported, a model can optimized for inference via techniques such as
|
<Tip>
|
||||||
quantization and pruning. If you are interested in optimizing your models to run
|
|
||||||
with maximum efficiency, check out the [🤗 Optimum
|
Once exported, a model can be optimized for inference via techniques such as
|
||||||
|
quantization and pruning. If you are interested in optimizing your models to run with
|
||||||
|
maximum efficiency, check out the [🤗 Optimum
|
||||||
library](https://github.com/huggingface/optimum).
|
library](https://github.com/huggingface/optimum).
|
||||||
|
|
||||||
## ONNX
|
</Tip>
|
||||||
|
|
||||||
The [ONNX (Open Neural Network eXchange)](http://onnx.ai) project is an open
|
ONNX is an open standard that defines a common set of operators and a common file format
|
||||||
standard that defines a common set of operators and a common file format to
|
to represent deep learning models in a wide variety of frameworks, including PyTorch and
|
||||||
represent deep learning models in a wide variety of frameworks, including
|
TensorFlow. When a model is exported to the ONNX format, these operators are used to
|
||||||
PyTorch and TensorFlow. When a model is exported to the ONNX format, these
|
construct a computational graph (often called an _intermediate representation_) which
|
||||||
operators are used to construct a computational graph (often called an
|
represents the flow of data through the neural network.
|
||||||
_intermediate representation_) which represents the flow of data through the
|
|
||||||
neural network.
|
|
||||||
|
|
||||||
By exposing a graph with standardized operators and data types, ONNX makes it
|
By exposing a graph with standardized operators and data types, ONNX makes it easy to
|
||||||
easy to switch between frameworks. For example, a model trained in PyTorch can
|
switch between frameworks. For example, a model trained in PyTorch can be exported to
|
||||||
be exported to ONNX format and then imported in TensorFlow (and vice versa).
|
ONNX format and then imported in TensorFlow (and vice versa).
|
||||||
|
|
||||||
🤗 Transformers provides a `transformers.onnx` package that enables you to
|
🤗 Transformers provides a [`transformers.onnx`](main_classes/onnx) package that enables
|
||||||
convert model checkpoints to an ONNX graph by leveraging configuration objects.
|
you to convert model checkpoints to an ONNX graph by leveraging configuration objects.
|
||||||
These configuration objects come ready made for a number of model architectures,
|
These configuration objects come ready made for a number of model architectures, and are
|
||||||
and are designed to be easily extendable to other architectures.
|
designed to be easily extendable to other architectures.
|
||||||
|
|
||||||
Ready-made configurations include the following architectures:
|
Ready-made configurations include the following architectures:
|
||||||
|
|
||||||
@@ -106,10 +106,10 @@ In the next two sections, we'll show you how to:
|
|||||||
* Export a supported model using the `transformers.onnx` package.
|
* Export a supported model using the `transformers.onnx` package.
|
||||||
* Export a custom model for an unsupported architecture.
|
* Export a custom model for an unsupported architecture.
|
||||||
|
|
||||||
### Exporting a model to ONNX
|
## Exporting a model to ONNX
|
||||||
|
|
||||||
To export a 🤗 Transformers model to ONNX, you'll first need to install some
|
To export a 🤗 Transformers model to ONNX, you'll first need to install some extra
|
||||||
extra dependencies:
|
dependencies:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install transformers[onnx]
|
pip install transformers[onnx]
|
||||||
@@ -141,7 +141,7 @@ Exporting a checkpoint using a ready-made configuration can be done as follows:
|
|||||||
python -m transformers.onnx --model=distilbert-base-uncased onnx/
|
python -m transformers.onnx --model=distilbert-base-uncased onnx/
|
||||||
```
|
```
|
||||||
|
|
||||||
which should show the following logs:
|
You should see the following logs:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
Validating ONNX model...
|
Validating ONNX model...
|
||||||
@@ -152,13 +152,13 @@ Validating ONNX model...
|
|||||||
All good, model saved at: onnx/model.onnx
|
All good, model saved at: onnx/model.onnx
|
||||||
```
|
```
|
||||||
|
|
||||||
This exports an ONNX graph of the checkpoint defined by the `--model` argument.
|
This exports an ONNX graph of the checkpoint defined by the `--model` argument. In this
|
||||||
In this example it is `distilbert-base-uncased`, but it can be any checkpoint on
|
example, it is `distilbert-base-uncased`, but it can be any checkpoint on the Hugging
|
||||||
the Hugging Face Hub or one that's stored locally.
|
Face Hub or one that's stored locally.
|
||||||
|
|
||||||
The resulting `model.onnx` file can then be run on one of the [many
|
The resulting `model.onnx` file can then be run on one of the [many
|
||||||
accelerators](https://onnx.ai/supported-tools.html#deployModel) that support the
|
accelerators](https://onnx.ai/supported-tools.html#deployModel) that support the ONNX
|
||||||
ONNX standard. For example, we can load and run the model with [ONNX
|
standard. For example, we can load and run the model with [ONNX
|
||||||
Runtime](https://onnxruntime.ai/) as follows:
|
Runtime](https://onnxruntime.ai/) as follows:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
@@ -172,9 +172,8 @@ Runtime](https://onnxruntime.ai/) as follows:
|
|||||||
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
>>> outputs = session.run(output_names=["last_hidden_state"], input_feed=dict(inputs))
|
||||||
```
|
```
|
||||||
|
|
||||||
The required output names (i.e. `["last_hidden_state"]`) can be obtained by
|
The required output names (like `["last_hidden_state"]`) can be obtained by taking a
|
||||||
taking a look at the ONNX configuration of each model. For example, for
|
look at the ONNX configuration of each model. For example, for DistilBERT we have:
|
||||||
DistilBERT we have:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
>>> from transformers.models.distilbert import DistilBertConfig, DistilBertOnnxConfig
|
||||||
@@ -185,20 +184,19 @@ DistilBERT we have:
|
|||||||
["last_hidden_state"]
|
["last_hidden_state"]
|
||||||
```
|
```
|
||||||
|
|
||||||
The process is identical for TensorFlow checkpoints on the Hub. For example, we
|
The process is identical for TensorFlow checkpoints on the Hub. For example, we can
|
||||||
can export a pure TensorFlow checkpoint from the [Keras
|
export a pure TensorFlow checkpoint from the [Keras
|
||||||
organization](https://huggingface.co/keras-io) as follows:
|
organization](https://huggingface.co/keras-io) as follows:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
python -m transformers.onnx --model=keras-io/transformers-qa onnx/
|
||||||
```
|
```
|
||||||
|
|
||||||
To export a model that's stored locally, you'll need to have the model's weights
|
To export a model that's stored locally, you'll need to have the model's weights and
|
||||||
and tokenizer files stored in a directory. For example, we can load and save a
|
tokenizer files stored in a directory. For example, we can load and save a checkpoint as
|
||||||
checkpoint as follows:
|
follows:
|
||||||
|
|
||||||
<frameworkcontent>
|
<frameworkcontent> <pt>
|
||||||
<pt>
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
||||||
|
|
||||||
@@ -216,8 +214,7 @@ argument of the `transformers.onnx` package to the desired directory:
|
|||||||
```bash
|
```bash
|
||||||
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
python -m transformers.onnx --model=local-pt-checkpoint onnx/
|
||||||
```
|
```
|
||||||
</pt>
|
</pt> <tf>
|
||||||
<tf>
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
|
>>> from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
|
||||||
|
|
||||||
@@ -235,14 +232,13 @@ argument of the `transformers.onnx` package to the desired directory:
|
|||||||
```bash
|
```bash
|
||||||
python -m transformers.onnx --model=local-tf-checkpoint onnx/
|
python -m transformers.onnx --model=local-tf-checkpoint onnx/
|
||||||
```
|
```
|
||||||
</tf>
|
</tf> </frameworkcontent>
|
||||||
</frameworkcontent>
|
|
||||||
|
|
||||||
### Selecting features for different model topologies
|
## Selecting features for different model tasks
|
||||||
|
|
||||||
Each ready-made configuration comes with a set of _features_ that enable you to
|
Each ready-made configuration comes with a set of _features_ that enable you to export
|
||||||
export models for different types of topologies or tasks. As shown in the table
|
models for different types of tasks. As shown in the table below, each feature is
|
||||||
below, each feature is associated with a different auto class:
|
associated with a different `AutoClass`:
|
||||||
|
|
||||||
| Feature | Auto Class |
|
| Feature | Auto Class |
|
||||||
| ------------------------------------ | ------------------------------------ |
|
| ------------------------------------ | ------------------------------------ |
|
||||||
@@ -255,7 +251,7 @@ below, each feature is associated with a different auto class:
|
|||||||
| `token-classification` | `AutoModelForTokenClassification` |
|
| `token-classification` | `AutoModelForTokenClassification` |
|
||||||
|
|
||||||
For each configuration, you can find the list of supported features via the
|
For each configuration, you can find the list of supported features via the
|
||||||
`FeaturesManager`. For example, for DistilBERT we have:
|
[`~transformers.onnx.FeaturesManager`]. For example, for DistilBERT we have:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers.onnx.features import FeaturesManager
|
>>> from transformers.onnx.features import FeaturesManager
|
||||||
@@ -266,15 +262,15 @@ For each configuration, you can find the list of supported features via the
|
|||||||
```
|
```
|
||||||
|
|
||||||
You can then pass one of these features to the `--feature` argument in the
|
You can then pass one of these features to the `--feature` argument in the
|
||||||
`transformers.onnx` package. For example, to export a text-classification model
|
`transformers.onnx` package. For example, to export a text-classification model we can
|
||||||
we can pick a fine-tuned model from the Hub and run:
|
pick a fine-tuned model from the Hub and run:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
python -m transformers.onnx --model=distilbert-base-uncased-finetuned-sst-2-english \
|
python -m transformers.onnx --model=distilbert-base-uncased-finetuned-sst-2-english \
|
||||||
--feature=sequence-classification onnx/
|
--feature=sequence-classification onnx/
|
||||||
```
|
```
|
||||||
|
|
||||||
which will display the following logs:
|
This displays the following logs:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
Validating ONNX model...
|
Validating ONNX model...
|
||||||
@@ -285,37 +281,35 @@ Validating ONNX model...
|
|||||||
All good, model saved at: onnx/model.onnx
|
All good, model saved at: onnx/model.onnx
|
||||||
```
|
```
|
||||||
|
|
||||||
Notice that in this case, the output names from the fine-tuned model are
|
Notice that in this case, the output names from the fine-tuned model are `logits`
|
||||||
`logits` instead of the `last_hidden_state` we saw with the
|
instead of the `last_hidden_state` we saw with the `distilbert-base-uncased` checkpoint
|
||||||
`distilbert-base-uncased` checkpoint earlier. This is expected since the
|
earlier. This is expected since the fine-tuned model has a sequence classification head.
|
||||||
fine-tuned model has a sequence classification head.
|
|
||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
The features that have a `with-past` suffix (e.g. `causal-lm-with-past`)
|
The features that have a `with-past` suffix (like `causal-lm-with-past`) correspond to
|
||||||
correspond to model topologies with precomputed hidden states (key and values
|
model classes with precomputed hidden states (key and values in the attention blocks)
|
||||||
in the attention blocks) that can be used for fast autoregressive decoding.
|
that can be used for fast autoregressive decoding.
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
|
||||||
### Exporting a model for an unsupported architecture
|
## Exporting a model for an unsupported architecture
|
||||||
|
|
||||||
If you wish to export a model whose architecture is not natively supported by
|
If you wish to export a model whose architecture is not natively supported by the
|
||||||
the library, there are three main steps to follow:
|
library, there are three main steps to follow:
|
||||||
|
|
||||||
1. Implement a custom ONNX configuration.
|
1. Implement a custom ONNX configuration.
|
||||||
2. Export the model to ONNX.
|
2. Export the model to ONNX.
|
||||||
3. Validate the outputs of the PyTorch and exported models.
|
3. Validate the outputs of the PyTorch and exported models.
|
||||||
|
|
||||||
In this section, we'll look at how DistilBERT was implemented to show what's
|
In this section, we'll look at how DistilBERT was implemented to show what's involved
|
||||||
involved with each step.
|
with each step.
|
||||||
|
|
||||||
#### Implementing a custom ONNX configuration
|
### Implementing a custom ONNX configuration
|
||||||
|
|
||||||
Let's start with the ONNX configuration object. We provide three abstract
|
Let's start with the ONNX configuration object. We provide three abstract classes that
|
||||||
classes that you should inherit from, depending on the type of model
|
you should inherit from, depending on the type of model architecture you wish to export:
|
||||||
architecture you wish to export:
|
|
||||||
|
|
||||||
* Encoder-based models inherit from [`~onnx.config.OnnxConfig`]
|
* Encoder-based models inherit from [`~onnx.config.OnnxConfig`]
|
||||||
* Decoder-based models inherit from [`~onnx.config.OnnxConfigWithPast`]
|
* Decoder-based models inherit from [`~onnx.config.OnnxConfigWithPast`]
|
||||||
@@ -347,25 +341,24 @@ Since DistilBERT is an encoder-based model, its configuration inherits from
|
|||||||
... )
|
... )
|
||||||
```
|
```
|
||||||
|
|
||||||
Every configuration object must implement the `inputs` property and return a
|
Every configuration object must implement the `inputs` property and return a mapping,
|
||||||
mapping, where each key corresponds to an expected input, and each value
|
where each key corresponds to an expected input, and each value indicates the axis of
|
||||||
indicates the axis of that input. For DistilBERT, we can see that two inputs are
|
that input. For DistilBERT, we can see that two inputs are required: `input_ids` and
|
||||||
required: `input_ids` and `attention_mask`. These inputs have the same shape of
|
`attention_mask`. These inputs have the same shape of `(batch_size, sequence_length)`
|
||||||
`(batch_size, sequence_length)` which is why we see the same axes used in the
|
which is why we see the same axes used in the configuration.
|
||||||
configuration.
|
|
||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
Notice that `inputs` property for `DistilBertOnnxConfig` returns an
|
Notice that `inputs` property for `DistilBertOnnxConfig` returns an `OrderedDict`. This
|
||||||
`OrderedDict`. This ensures that the inputs are matched with their relative
|
ensures that the inputs are matched with their relative position within the
|
||||||
position within the `PreTrainedModel.forward()` method when tracing the graph.
|
`PreTrainedModel.forward()` method when tracing the graph. We recommend using an
|
||||||
We recommend using an `OrderedDict` for the `inputs` and `outputs` properties
|
`OrderedDict` for the `inputs` and `outputs` properties when implementing custom ONNX
|
||||||
when implementing custom ONNX configurations.
|
configurations.
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
Once you have implemented an ONNX configuration, you can instantiate it by
|
Once you have implemented an ONNX configuration, you can instantiate it by providing the
|
||||||
providing the base model's configuration as follows:
|
base model's configuration as follows:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import AutoConfig
|
>>> from transformers import AutoConfig
|
||||||
@@ -374,8 +367,8 @@ providing the base model's configuration as follows:
|
|||||||
>>> onnx_config = DistilBertOnnxConfig(config)
|
>>> onnx_config = DistilBertOnnxConfig(config)
|
||||||
```
|
```
|
||||||
|
|
||||||
The resulting object has several useful properties. For example you can view the
|
The resulting object has several useful properties. For example, you can view the ONNX
|
||||||
ONNX operator set that will be used during the export:
|
operator set that will be used during the export:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> print(onnx_config.default_onnx_opset)
|
>>> print(onnx_config.default_onnx_opset)
|
||||||
@@ -389,15 +382,14 @@ You can also view the outputs associated with the model as follows:
|
|||||||
OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"})])
|
OrderedDict([("last_hidden_state", {0: "batch", 1: "sequence"})])
|
||||||
```
|
```
|
||||||
|
|
||||||
Notice that the outputs property follows the same structure as the inputs; it
|
Notice that the outputs property follows the same structure as the inputs; it returns an
|
||||||
returns an `OrderedDict` of named outputs and their shapes. The output structure
|
`OrderedDict` of named outputs and their shapes. The output structure is linked to the
|
||||||
is linked to the choice of feature that the configuration is initialised with.
|
choice of feature that the configuration is initialised with. By default, the ONNX
|
||||||
By default, the ONNX configuration is initialized with the `default` feature
|
configuration is initialized with the `default` feature that corresponds to exporting a
|
||||||
that corresponds to exporting a model loaded with the `AutoModel` class. If you
|
model loaded with the `AutoModel` class. If you want to export a model for another task,
|
||||||
want to export a different model topology, just provide a different feature to
|
just provide a different feature to the `task` argument when you initialize the ONNX
|
||||||
the `task` argument when you initialize the ONNX configuration. For example, if
|
configuration. For example, if we wished to export DistilBERT with a sequence
|
||||||
we wished to export DistilBERT with a sequence classification head, we could
|
classification head, we could use:
|
||||||
use:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers import AutoConfig
|
>>> from transformers import AutoConfig
|
||||||
@@ -410,18 +402,18 @@ OrderedDict([('logits', {0: 'batch'})])
|
|||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
All of the base properties and methods associated with [`~onnx.config.OnnxConfig`] and the
|
All of the base properties and methods associated with [`~onnx.config.OnnxConfig`] and
|
||||||
other configuration classes can be overriden if needed. Check out
|
the other configuration classes can be overriden if needed. Check out [`BartOnnxConfig`]
|
||||||
[`BartOnnxConfig`] for an advanced example.
|
for an advanced example.
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
#### Exporting the model
|
### Exporting the model
|
||||||
|
|
||||||
Once you have implemented the ONNX configuration, the next step is to export the
|
Once you have implemented the ONNX configuration, the next step is to export the model.
|
||||||
model. Here we can use the `export()` function provided by the
|
Here we can use the `export()` function provided by the `transformers.onnx` package.
|
||||||
`transformers.onnx` package. This function expects the ONNX configuration, along
|
This function expects the ONNX configuration, along with the base model and tokenizer,
|
||||||
with the base model and tokenizer, and the path to save the exported file:
|
and the path to save the exported file:
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from pathlib import Path
|
>>> from pathlib import Path
|
||||||
@@ -436,10 +428,9 @@ with the base model and tokenizer, and the path to save the exported file:
|
|||||||
>>> onnx_inputs, onnx_outputs = export(tokenizer, base_model, onnx_config, onnx_config.default_onnx_opset, onnx_path)
|
>>> onnx_inputs, onnx_outputs = export(tokenizer, base_model, onnx_config, onnx_config.default_onnx_opset, onnx_path)
|
||||||
```
|
```
|
||||||
|
|
||||||
The `onnx_inputs` and `onnx_outputs` returned by the `export()` function are
|
The `onnx_inputs` and `onnx_outputs` returned by the `export()` function are lists of
|
||||||
lists of the keys defined in the `inputs` and `outputs` properties of the
|
the keys defined in the `inputs` and `outputs` properties of the configuration. Once the
|
||||||
configuration. Once the model is exported, you can test that the model is well
|
model is exported, you can test that the model is well formed as follows:
|
||||||
formed as follows:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> import onnx
|
>>> import onnx
|
||||||
@@ -450,21 +441,20 @@ formed as follows:
|
|||||||
|
|
||||||
<Tip>
|
<Tip>
|
||||||
|
|
||||||
If your model is larger than 2GB, you will see that many additional files are
|
If your model is larger than 2GB, you will see that many additional files are created
|
||||||
created during the export. This is _expected_ because ONNX uses [Protocol
|
during the export. This is _expected_ because ONNX uses [Protocol
|
||||||
Buffers](https://developers.google.com/protocol-buffers/) to store the model and
|
Buffers](https://developers.google.com/protocol-buffers/) to store the model and these
|
||||||
these have a size limit of 2GB. See the [ONNX
|
have a size limit of 2GB. See the [ONNX
|
||||||
documentation](https://github.com/onnx/onnx/blob/master/docs/ExternalData.md)
|
documentation](https://github.com/onnx/onnx/blob/master/docs/ExternalData.md) for
|
||||||
for instructions on how to load models with external data.
|
instructions on how to load models with external data.
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
#### Validating the model outputs
|
### Validating the model outputs
|
||||||
|
|
||||||
The final step is to validate that the outputs from the base and exported model
|
The final step is to validate that the outputs from the base and exported model agree
|
||||||
agree within some absolute tolerance. Here we can use the
|
within some absolute tolerance. Here we can use the `validate_model_outputs()` function
|
||||||
`validate_model_outputs()` function provided by the `transformers.onnx` package
|
provided by the `transformers.onnx` package as follows:
|
||||||
as follows:
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
>>> from transformers.onnx import validate_model_outputs
|
>>> from transformers.onnx import validate_model_outputs
|
||||||
@@ -474,220 +464,23 @@ as follows:
|
|||||||
... )
|
... )
|
||||||
```
|
```
|
||||||
|
|
||||||
This function uses the `OnnxConfig.generate_dummy_inputs()` method to generate
|
This function uses the [`~transformers.onnx.OnnxConfig.generate_dummy_inputs`] method to
|
||||||
inputs for the base and exported model, and the absolute tolerance can be
|
generate inputs for the base and exported model, and the absolute tolerance can be
|
||||||
defined in the configuration. We generally find numerical agreement in the 1e-6
|
defined in the configuration. We generally find numerical agreement in the 1e-6 to 1e-4
|
||||||
to 1e-4 range, although anything smaller than 1e-3 is likely to be OK.
|
range, although anything smaller than 1e-3 is likely to be OK.
|
||||||
|
|
||||||
### Contributing a new configuration to 🤗 Transformers
|
## Contributing a new configuration to 🤗 Transformers
|
||||||
|
|
||||||
We are looking to expand the set of ready-made configurations and welcome
|
We are looking to expand the set of ready-made configurations and welcome contributions
|
||||||
contributions from the community! If you would like to contribute your addition
|
from the community! If you would like to contribute your addition to the library, you
|
||||||
to the library, you will need to:
|
will need to:
|
||||||
|
|
||||||
* Implement the ONNX configuration in the corresponding `configuration_<model_name>.py`
|
* Implement the ONNX configuration in the corresponding `configuration_<model_name>.py`
|
||||||
file
|
file
|
||||||
* Include the model architecture and corresponding features in [`~onnx.features.FeatureManager`]
|
* Include the model architecture and corresponding features in
|
||||||
|
[`~onnx.features.FeatureManager`]
|
||||||
* Add your model architecture to the tests in `test_onnx_v2.py`
|
* Add your model architecture to the tests in `test_onnx_v2.py`
|
||||||
|
|
||||||
Check out how the configuration for [IBERT was
|
Check out how the configuration for [IBERT was
|
||||||
contributed](https://github.com/huggingface/transformers/pull/14868/files) to
|
contributed](https://github.com/huggingface/transformers/pull/14868/files) to get an
|
||||||
get an idea of what's involved.
|
idea of what's involved.
|
||||||
|
|
||||||
## TorchScript
|
|
||||||
|
|
||||||
<Tip>
|
|
||||||
|
|
||||||
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities with
|
|
||||||
variable-input-size models. It is a focus of interest to us and we will deepen our analysis in upcoming releases,
|
|
||||||
with more code examples, a more flexible implementation, and benchmarks comparing python-based codes with compiled
|
|
||||||
TorchScript.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
According to Pytorch's documentation: "TorchScript is a way to create serializable and optimizable models from PyTorch
|
|
||||||
code". Pytorch's two modules [JIT and TRACE](https://pytorch.org/docs/stable/jit.html) allow the developer to export
|
|
||||||
their model to be re-used in other programs, such as efficiency-oriented C++ programs.
|
|
||||||
|
|
||||||
We have provided an interface that allows the export of 🤗 Transformers models to TorchScript so that they can be reused
|
|
||||||
in a different environment than a Pytorch-based python program. Here we explain how to export and use our models using
|
|
||||||
TorchScript.
|
|
||||||
|
|
||||||
Exporting a model requires two things:
|
|
||||||
|
|
||||||
- a forward pass with dummy inputs.
|
|
||||||
- model instantiation with the `torchscript` flag.
|
|
||||||
|
|
||||||
These necessities imply several things developers should be careful about. These are detailed below.
|
|
||||||
|
|
||||||
### TorchScript flag and tied weights
|
|
||||||
|
|
||||||
This flag is necessary because most of the language models in this repository have tied weights between their
|
|
||||||
`Embedding` layer and their `Decoding` layer. TorchScript does not allow the export of models that have tied
|
|
||||||
weights, therefore it is necessary to untie and clone the weights beforehand.
|
|
||||||
|
|
||||||
This implies that models instantiated with the `torchscript` flag have their `Embedding` layer and `Decoding`
|
|
||||||
layer separate, which means that they should not be trained down the line. Training would de-synchronize the two
|
|
||||||
layers, leading to unexpected results.
|
|
||||||
|
|
||||||
This is not the case for models that do not have a Language Model head, as those do not have tied weights. These models
|
|
||||||
can be safely exported without the `torchscript` flag.
|
|
||||||
|
|
||||||
### Dummy inputs and standard lengths
|
|
||||||
|
|
||||||
The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
|
|
||||||
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used to
|
|
||||||
create the "trace" of the model.
|
|
||||||
|
|
||||||
The trace is created relatively to the inputs' dimensions. It is therefore constrained by the dimensions of the dummy
|
|
||||||
input, and will not work for any other sequence length or batch size. When trying with a different size, an error such
|
|
||||||
as:
|
|
||||||
|
|
||||||
`The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2`
|
|
||||||
|
|
||||||
will be raised. It is therefore recommended to trace the model with a dummy input size at least as large as the largest
|
|
||||||
input that will be fed to the model during inference. Padding can be performed to fill the missing values. As the model
|
|
||||||
will have been traced with a large input size however, the dimensions of the different matrix will be large as well,
|
|
||||||
resulting in more calculations.
|
|
||||||
|
|
||||||
It is recommended to be careful of the total number of operations done on each input and to follow performance closely
|
|
||||||
when exporting varying sequence-length models.
|
|
||||||
|
|
||||||
### Using TorchScript in Python
|
|
||||||
|
|
||||||
Below is an example, showing how to save, load models as well as how to use the trace for inference.
|
|
||||||
|
|
||||||
#### Saving a model
|
|
||||||
|
|
||||||
This snippet shows how to use TorchScript to export a `BertModel`. Here the `BertModel` is instantiated according
|
|
||||||
to a `BertConfig` class and then saved to disk under the filename `traced_bert.pt`
|
|
||||||
|
|
||||||
```python
|
|
||||||
from transformers import BertModel, BertTokenizer, BertConfig
|
|
||||||
import torch
|
|
||||||
|
|
||||||
enc = BertTokenizer.from_pretrained("bert-base-uncased")
|
|
||||||
|
|
||||||
# Tokenizing input text
|
|
||||||
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
|
|
||||||
tokenized_text = enc.tokenize(text)
|
|
||||||
|
|
||||||
# Masking one of the input tokens
|
|
||||||
masked_index = 8
|
|
||||||
tokenized_text[masked_index] = "[MASK]"
|
|
||||||
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
|
|
||||||
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
|
|
||||||
|
|
||||||
# Creating a dummy input
|
|
||||||
tokens_tensor = torch.tensor([indexed_tokens])
|
|
||||||
segments_tensors = torch.tensor([segments_ids])
|
|
||||||
dummy_input = [tokens_tensor, segments_tensors]
|
|
||||||
|
|
||||||
# Initializing the model with the torchscript flag
|
|
||||||
# Flag set to True even though it is not necessary as this model does not have an LM Head.
|
|
||||||
config = BertConfig(
|
|
||||||
vocab_size_or_config_json_file=32000,
|
|
||||||
hidden_size=768,
|
|
||||||
num_hidden_layers=12,
|
|
||||||
num_attention_heads=12,
|
|
||||||
intermediate_size=3072,
|
|
||||||
torchscript=True,
|
|
||||||
)
|
|
||||||
|
|
||||||
# Instantiating the model
|
|
||||||
model = BertModel(config)
|
|
||||||
|
|
||||||
# The model needs to be in evaluation mode
|
|
||||||
model.eval()
|
|
||||||
|
|
||||||
# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
|
|
||||||
model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
|
|
||||||
|
|
||||||
# Creating the trace
|
|
||||||
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
|
|
||||||
torch.jit.save(traced_model, "traced_bert.pt")
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Loading a model
|
|
||||||
|
|
||||||
This snippet shows how to load the `BertModel` that was previously saved to disk under the name `traced_bert.pt`.
|
|
||||||
We are re-using the previously initialised `dummy_input`.
|
|
||||||
|
|
||||||
```python
|
|
||||||
loaded_model = torch.jit.load("traced_bert.pt")
|
|
||||||
loaded_model.eval()
|
|
||||||
|
|
||||||
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Using a traced model for inference
|
|
||||||
|
|
||||||
Using the traced model for inference is as simple as using its `__call__` dunder method:
|
|
||||||
|
|
||||||
```python
|
|
||||||
traced_model(tokens_tensor, segments_tensors)
|
|
||||||
```
|
|
||||||
|
|
||||||
### Deploying HuggingFace TorchScript models on AWS using the Neuron SDK
|
|
||||||
|
|
||||||
AWS introduced the [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/)
|
|
||||||
instance family for low cost, high performance machine learning inference in the cloud.
|
|
||||||
The Inf1 instances are powered by the AWS Inferentia chip, a custom-built hardware accelerator,
|
|
||||||
specializing in deep learning inferencing workloads.
|
|
||||||
[AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#)
|
|
||||||
is the SDK for Inferentia that supports tracing and optimizing transformers models for
|
|
||||||
deployment on Inf1. The Neuron SDK provides:
|
|
||||||
|
|
||||||
|
|
||||||
1. Easy-to-use API with one line of code change to trace and optimize a TorchScript model for inference in the cloud.
|
|
||||||
2. Out of the box performance optimizations for [improved cost-performance](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/benchmark/>)
|
|
||||||
3. Support for HuggingFace transformers models built with either [PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.html)
|
|
||||||
or [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html).
|
|
||||||
|
|
||||||
#### Implications
|
|
||||||
|
|
||||||
Transformers Models based on the [BERT (Bidirectional Encoder Representations from Transformers)](https://huggingface.co/docs/transformers/main/model_doc/bert)
|
|
||||||
architecture, or its variants such as [distilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert)
|
|
||||||
and [roBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta)
|
|
||||||
will run best on Inf1 for non-generative tasks such as Extractive Question Answering,
|
|
||||||
Sequence Classification, Token Classification. Alternatively, text generation
|
|
||||||
tasks can be adapted to run on Inf1, according to this [AWS Neuron MarianMT tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html).
|
|
||||||
More information about models that can be converted out of the box on Inferentia can be
|
|
||||||
found in the [Model Architecture Fit section of the Neuron documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia).
|
|
||||||
|
|
||||||
#### Dependencies
|
|
||||||
|
|
||||||
Using AWS Neuron to convert models requires the following dependencies and environment:
|
|
||||||
|
|
||||||
* A [Neuron SDK environment](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html#installation-guide),
|
|
||||||
which comes pre-configured on [AWS Deep Learning AMI](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html).
|
|
||||||
|
|
||||||
#### Converting a Model for AWS Neuron
|
|
||||||
|
|
||||||
Using the same script as in [Using TorchScript in Python](https://huggingface.co/docs/transformers/main/en/serialization#using-torchscript-in-python)
|
|
||||||
to trace a "BertModel", you import `torch.neuron` framework extension to access
|
|
||||||
the components of the Neuron SDK through a Python API.
|
|
||||||
|
|
||||||
```python
|
|
||||||
from transformers import BertModel, BertTokenizer, BertConfig
|
|
||||||
import torch
|
|
||||||
import torch.neuron
|
|
||||||
```
|
|
||||||
And only modify the tracing line of code
|
|
||||||
|
|
||||||
from:
|
|
||||||
|
|
||||||
```python
|
|
||||||
torch.jit.trace(model, [tokens_tensor, segments_tensors])
|
|
||||||
```
|
|
||||||
|
|
||||||
to:
|
|
||||||
|
|
||||||
```python
|
|
||||||
torch.neuron.trace(model, [token_tensor, segments_tensors])
|
|
||||||
```
|
|
||||||
|
|
||||||
This change enables Neuron SDK to trace the model and optimize it to run in Inf1 instances.
|
|
||||||
|
|
||||||
To learn more about AWS Neuron SDK features, tools, example tutorials and latest updates,
|
|
||||||
please see the [AWS NeuronSDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).
|
|
||||||
225
docs/source/en/torchscript.mdx
Normal file
225
docs/source/en/torchscript.mdx
Normal file
@@ -0,0 +1,225 @@
|
|||||||
|
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||||
|
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||||
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||||
|
specific language governing permissions and limitations under the License.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Export to TorchScript
|
||||||
|
|
||||||
|
<Tip>
|
||||||
|
|
||||||
|
This is the very beginning of our experiments with TorchScript and we are still
|
||||||
|
exploring its capabilities with variable-input-size models. It is a focus of interest to
|
||||||
|
us and we will deepen our analysis in upcoming releases, with more code examples, a more
|
||||||
|
flexible implementation, and benchmarks comparing Python-based codes with compiled
|
||||||
|
TorchScript.
|
||||||
|
|
||||||
|
</Tip>
|
||||||
|
|
||||||
|
According to the [TorchScript documentation](https://pytorch.org/docs/stable/jit.html):
|
||||||
|
|
||||||
|
> TorchScript is a way to create serializable and optimizable models from PyTorch code.
|
||||||
|
|
||||||
|
There are two PyTorch modules, [JIT and
|
||||||
|
TRACE](https://pytorch.org/docs/stable/jit.html), that allow developers to export their
|
||||||
|
models to be reused in other programs like efficiency-oriented C++ programs.
|
||||||
|
|
||||||
|
We provide an interface that allows you to export 🤗 Transformers models to TorchScript
|
||||||
|
so they can be reused in a different environment than PyTorch-based Python programs.
|
||||||
|
Here, we explain how to export and use our models using TorchScript.
|
||||||
|
|
||||||
|
Exporting a model requires two things:
|
||||||
|
|
||||||
|
- model instantiation with the `torchscript` flag
|
||||||
|
- a forward pass with dummy inputs
|
||||||
|
|
||||||
|
These necessities imply several things developers should be careful about as detailed
|
||||||
|
below.
|
||||||
|
|
||||||
|
## TorchScript flag and tied weights
|
||||||
|
|
||||||
|
The `torchscript` flag is necessary because most of the 🤗 Transformers language models
|
||||||
|
have tied weights between their `Embedding` layer and their `Decoding` layer.
|
||||||
|
TorchScript does not allow you to export models that have tied weights, so it is
|
||||||
|
necessary to untie and clone the weights beforehand.
|
||||||
|
|
||||||
|
Models instantiated with the `torchscript` flag have their `Embedding` layer and
|
||||||
|
`Decoding` layer separated, which means that they should not be trained down the line.
|
||||||
|
Training would desynchronize the two layers, leading to unexpected results.
|
||||||
|
|
||||||
|
This is not the case for models that do not have a language model head, as those do not
|
||||||
|
have tied weights. These models can be safely exported without the `torchscript` flag.
|
||||||
|
|
||||||
|
## Dummy inputs and standard lengths
|
||||||
|
|
||||||
|
The dummy inputs are used for a models forward pass. While the inputs' values are
|
||||||
|
propagated through the layers, PyTorch keeps track of the different operations executed
|
||||||
|
on each tensor. These recorded operations are then used to create the *trace* of the
|
||||||
|
model.
|
||||||
|
|
||||||
|
The trace is created relative to the inputs' dimensions. It is therefore constrained by
|
||||||
|
the dimensions of the dummy input, and will not work for any other sequence length or
|
||||||
|
batch size. When trying with a different size, the following error is raised:
|
||||||
|
|
||||||
|
```
|
||||||
|
`The expanded size of the tensor (3) must match the existing size (7) at non-singleton dimension 2`
|
||||||
|
```
|
||||||
|
|
||||||
|
We recommended you trace the model with a dummy input size at least as large as the
|
||||||
|
largest input that will be fed to the model during inference. Padding can help fill the
|
||||||
|
missing values. However, since the model is traced with a larger input size, the
|
||||||
|
dimensions of the matrix will also be large, resulting in more calculations.
|
||||||
|
|
||||||
|
Be careful of the total number of operations done on each input and follow the
|
||||||
|
performance closely when exporting varying sequence-length models.
|
||||||
|
|
||||||
|
## Using TorchScript in Python
|
||||||
|
|
||||||
|
This section demonstrates how to save and load models as well as how to use the trace
|
||||||
|
for inference.
|
||||||
|
|
||||||
|
### Saving a model
|
||||||
|
|
||||||
|
To export a `BertModel` with TorchScript, instantiate `BertModel` from the `BertConfig`
|
||||||
|
class and then save it to disk under the filename `traced_bert.pt`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import BertModel, BertTokenizer, BertConfig
|
||||||
|
import torch
|
||||||
|
|
||||||
|
enc = BertTokenizer.from_pretrained("bert-base-uncased")
|
||||||
|
|
||||||
|
# Tokenizing input text
|
||||||
|
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
|
||||||
|
tokenized_text = enc.tokenize(text)
|
||||||
|
|
||||||
|
# Masking one of the input tokens
|
||||||
|
masked_index = 8
|
||||||
|
tokenized_text[masked_index] = "[MASK]"
|
||||||
|
indexed_tokens = enc.convert_tokens_to_ids(tokenized_text)
|
||||||
|
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
|
||||||
|
|
||||||
|
# Creating a dummy input
|
||||||
|
tokens_tensor = torch.tensor([indexed_tokens])
|
||||||
|
segments_tensors = torch.tensor([segments_ids])
|
||||||
|
dummy_input = [tokens_tensor, segments_tensors]
|
||||||
|
|
||||||
|
# Initializing the model with the torchscript flag
|
||||||
|
# Flag set to True even though it is not necessary as this model does not have an LM Head.
|
||||||
|
config = BertConfig(
|
||||||
|
vocab_size_or_config_json_file=32000,
|
||||||
|
hidden_size=768,
|
||||||
|
num_hidden_layers=12,
|
||||||
|
num_attention_heads=12,
|
||||||
|
intermediate_size=3072,
|
||||||
|
torchscript=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
# Instantiating the model
|
||||||
|
model = BertModel(config)
|
||||||
|
|
||||||
|
# The model needs to be in evaluation mode
|
||||||
|
model.eval()
|
||||||
|
|
||||||
|
# If you are instantiating the model with *from_pretrained* you can also easily set the TorchScript flag
|
||||||
|
model = BertModel.from_pretrained("bert-base-uncased", torchscript=True)
|
||||||
|
|
||||||
|
# Creating the trace
|
||||||
|
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
|
||||||
|
torch.jit.save(traced_model, "traced_bert.pt")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Loading a model
|
||||||
|
|
||||||
|
Now you can load the previously saved `BertModel`, `traced_bert.pt`, from disk and use
|
||||||
|
it on the previously initialised `dummy_input`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
loaded_model = torch.jit.load("traced_bert.pt")
|
||||||
|
loaded_model.eval()
|
||||||
|
|
||||||
|
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Using a traced model for inference
|
||||||
|
|
||||||
|
Use the traced model for inference by using its `__call__` dunder method:
|
||||||
|
|
||||||
|
```python
|
||||||
|
traced_model(tokens_tensor, segments_tensors)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Deploy Hugging Face TorchScript models to AWS with the Neuron SDK
|
||||||
|
|
||||||
|
AWS introduced the [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/)
|
||||||
|
instance family for low cost, high performance machine learning inference in the cloud.
|
||||||
|
The Inf1 instances are powered by the AWS Inferentia chip, a custom-built hardware
|
||||||
|
accelerator, specializing in deep learning inferencing workloads. [AWS
|
||||||
|
Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#) is the SDK for
|
||||||
|
Inferentia that supports tracing and optimizing transformers models for deployment on
|
||||||
|
Inf1. The Neuron SDK provides:
|
||||||
|
|
||||||
|
|
||||||
|
1. Easy-to-use API with one line of code change to trace and optimize a TorchScript
|
||||||
|
model for inference in the cloud.
|
||||||
|
2. Out of the box performance optimizations for [improved
|
||||||
|
cost-performance](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/benchmark/>).
|
||||||
|
3. Support for Hugging Face transformers models built with either
|
||||||
|
[PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.html)
|
||||||
|
or
|
||||||
|
[TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html).
|
||||||
|
|
||||||
|
### Implications
|
||||||
|
|
||||||
|
Transformers models based on the [BERT (Bidirectional Encoder Representations from
|
||||||
|
Transformers)](https://huggingface.co/docs/transformers/main/model_doc/bert)
|
||||||
|
architecture, or its variants such as
|
||||||
|
[distilBERT](https://huggingface.co/docs/transformers/main/model_doc/distilbert) and
|
||||||
|
[roBERTa](https://huggingface.co/docs/transformers/main/model_doc/roberta) run best on
|
||||||
|
Inf1 for non-generative tasks such as extractive question answering, sequence
|
||||||
|
classification, and token classification. However, text generation tasks can still be
|
||||||
|
adapted to run on Inf1 according to this [AWS Neuron MarianMT
|
||||||
|
tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html).
|
||||||
|
More information about models that can be converted out of the box on Inferentia can be
|
||||||
|
found in the [Model Architecture
|
||||||
|
Fit](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia)
|
||||||
|
section of the Neuron documentation.
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
Using AWS Neuron to convert models requires a [Neuron SDK
|
||||||
|
environment](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html#installation-guide)
|
||||||
|
which comes preconfigured on [AWS Deep Learning
|
||||||
|
AMI](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html).
|
||||||
|
|
||||||
|
### Converting a model for AWS Neuron
|
||||||
|
|
||||||
|
Convert a model for AWS NEURON using the same code from [Using TorchScript in
|
||||||
|
Python](serialization#using-torchscript-in-python) to trace a `BertModel`. Import the
|
||||||
|
`torch.neuron` framework extension to access the components of the Neuron SDK through a
|
||||||
|
Python API:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import BertModel, BertTokenizer, BertConfig
|
||||||
|
import torch
|
||||||
|
import torch.neuron
|
||||||
|
```
|
||||||
|
|
||||||
|
You only need to modify the following line:
|
||||||
|
|
||||||
|
```diff
|
||||||
|
- torch.jit.trace(model, [tokens_tensor, segments_tensors])
|
||||||
|
+ torch.neuron.trace(model, [token_tensor, segments_tensors])
|
||||||
|
```
|
||||||
|
|
||||||
|
This enables the Neuron SDK to trace the model and optimize it for Inf1 instances.
|
||||||
|
|
||||||
|
To learn more about AWS Neuron SDK features, tools, example tutorials and latest
|
||||||
|
updates, please see the [AWS NeuronSDK
|
||||||
|
documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).
|
||||||
Reference in New Issue
Block a user