M2M100 support for ONNX export (#15193)

* Add M2M100 support for ONNX export

* Delete useless imports

* Add M2M100 to tests

* Fix protobuf issue
This commit is contained in:
Michael Benayoun
2022-03-02 04:03:14 -05:00
committed by GitHub
parent d1a29078c0
commit 4bfe75bd08
6 changed files with 177 additions and 29 deletions

View File

@@ -55,6 +55,7 @@ Ready-made configurations include the following architectures:
- GPT Neo
- I-BERT
- LayoutLM
- M2M100
- Marian
- mBART
- OpenAI GPT-2
@@ -584,12 +585,12 @@ traced_model(tokens_tensor, segments_tensors)
### Deploying HuggingFace TorchScript models on AWS using the Neuron SDK
AWS introduced the [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/)
instance family for low cost, high performance machine learning inference in the cloud.
The Inf1 instances are powered by the AWS Inferentia chip, a custom-built hardware accelerator,
specializing in deep learning inferencing workloads.
[AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#)
is the SDK for Inferentia that supports tracing and optimizing transformers models for
AWS introduced the [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/)
instance family for low cost, high performance machine learning inference in the cloud.
The Inf1 instances are powered by the AWS Inferentia chip, a custom-built hardware accelerator,
specializing in deep learning inferencing workloads.
[AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#)
is the SDK for Inferentia that supports tracing and optimizing transformers models for
deployment on Inf1. The Neuron SDK provides:
@@ -600,13 +601,13 @@ deployment on Inf1. The Neuron SDK provides:
#### Implications
Transformers Models based on the [BERT (Bidirectional Encoder Representations from Transformers)](https://huggingface.co/docs/transformers/master/model_doc/bert)
Transformers Models based on the [BERT (Bidirectional Encoder Representations from Transformers)](https://huggingface.co/docs/transformers/master/model_doc/bert)
architecture, or its variants such as [distilBERT](https://huggingface.co/docs/transformers/master/model_doc/distilbert)
and [roBERTa](https://huggingface.co/docs/transformers/master/model_doc/roberta)
will run best on Inf1 for non-generative tasks such as Extractive Question Answering,
and [roBERTa](https://huggingface.co/docs/transformers/master/model_doc/roberta)
will run best on Inf1 for non-generative tasks such as Extractive Question Answering,
Sequence Classification, Token Classification. Alternatively, text generation
tasks can be adapted to run on Inf1, according to this [AWS Neuron MarianMT tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html).
More information about models that can be converted out of the box on Inferentia can be
tasks can be adapted to run on Inf1, according to this [AWS Neuron MarianMT tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html).
More information about models that can be converted out of the box on Inferentia can be
found in the [Model Architecture Fit section of the Neuron documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia).
#### Dependencies
@@ -618,8 +619,8 @@ Using AWS Neuron to convert models requires the following dependencies and envir
#### Converting a Model for AWS Neuron
Using the same script as in [Using TorchScript in Python](https://huggingface.co/docs/transformers/master/en/serialization#using-torchscript-in-python)
to trace a "BertModel", you import `torch.neuron` framework extension to access
Using the same script as in [Using TorchScript in Python](https://huggingface.co/docs/transformers/master/en/serialization#using-torchscript-in-python)
to trace a "BertModel", you import `torch.neuron` framework extension to access
the components of the Neuron SDK through a Python API.
```python
@@ -643,5 +644,5 @@ torch.neuron.trace(model, [token_tensor, segments_tensors])
This change enables Neuron SDK to trace the model and optimize it to run in Inf1 instances.
To learn more about AWS Neuron SDK features, tools, example tutorials and latest updates,
To learn more about AWS Neuron SDK features, tools, example tutorials and latest updates,
please see the [AWS NeuronSDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).