M2M100 support for ONNX export (#15193)
* Add M2M100 support for ONNX export * Delete useless imports * Add M2M100 to tests * Fix protobuf issue
This commit is contained in:
@@ -55,6 +55,7 @@ Ready-made configurations include the following architectures:
|
||||
- GPT Neo
|
||||
- I-BERT
|
||||
- LayoutLM
|
||||
- M2M100
|
||||
- Marian
|
||||
- mBART
|
||||
- OpenAI GPT-2
|
||||
@@ -584,12 +585,12 @@ traced_model(tokens_tensor, segments_tensors)
|
||||
|
||||
### Deploying HuggingFace TorchScript models on AWS using the Neuron SDK
|
||||
|
||||
AWS introduced the [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/)
|
||||
instance family for low cost, high performance machine learning inference in the cloud.
|
||||
The Inf1 instances are powered by the AWS Inferentia chip, a custom-built hardware accelerator,
|
||||
specializing in deep learning inferencing workloads.
|
||||
[AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#)
|
||||
is the SDK for Inferentia that supports tracing and optimizing transformers models for
|
||||
AWS introduced the [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/)
|
||||
instance family for low cost, high performance machine learning inference in the cloud.
|
||||
The Inf1 instances are powered by the AWS Inferentia chip, a custom-built hardware accelerator,
|
||||
specializing in deep learning inferencing workloads.
|
||||
[AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#)
|
||||
is the SDK for Inferentia that supports tracing and optimizing transformers models for
|
||||
deployment on Inf1. The Neuron SDK provides:
|
||||
|
||||
|
||||
@@ -600,13 +601,13 @@ deployment on Inf1. The Neuron SDK provides:
|
||||
|
||||
#### Implications
|
||||
|
||||
Transformers Models based on the [BERT (Bidirectional Encoder Representations from Transformers)](https://huggingface.co/docs/transformers/master/model_doc/bert)
|
||||
Transformers Models based on the [BERT (Bidirectional Encoder Representations from Transformers)](https://huggingface.co/docs/transformers/master/model_doc/bert)
|
||||
architecture, or its variants such as [distilBERT](https://huggingface.co/docs/transformers/master/model_doc/distilbert)
|
||||
and [roBERTa](https://huggingface.co/docs/transformers/master/model_doc/roberta)
|
||||
will run best on Inf1 for non-generative tasks such as Extractive Question Answering,
|
||||
and [roBERTa](https://huggingface.co/docs/transformers/master/model_doc/roberta)
|
||||
will run best on Inf1 for non-generative tasks such as Extractive Question Answering,
|
||||
Sequence Classification, Token Classification. Alternatively, text generation
|
||||
tasks can be adapted to run on Inf1, according to this [AWS Neuron MarianMT tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html).
|
||||
More information about models that can be converted out of the box on Inferentia can be
|
||||
tasks can be adapted to run on Inf1, according to this [AWS Neuron MarianMT tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html).
|
||||
More information about models that can be converted out of the box on Inferentia can be
|
||||
found in the [Model Architecture Fit section of the Neuron documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia).
|
||||
|
||||
#### Dependencies
|
||||
@@ -618,8 +619,8 @@ Using AWS Neuron to convert models requires the following dependencies and envir
|
||||
|
||||
#### Converting a Model for AWS Neuron
|
||||
|
||||
Using the same script as in [Using TorchScript in Python](https://huggingface.co/docs/transformers/master/en/serialization#using-torchscript-in-python)
|
||||
to trace a "BertModel", you import `torch.neuron` framework extension to access
|
||||
Using the same script as in [Using TorchScript in Python](https://huggingface.co/docs/transformers/master/en/serialization#using-torchscript-in-python)
|
||||
to trace a "BertModel", you import `torch.neuron` framework extension to access
|
||||
the components of the Neuron SDK through a Python API.
|
||||
|
||||
```python
|
||||
@@ -643,5 +644,5 @@ torch.neuron.trace(model, [token_tensor, segments_tensors])
|
||||
|
||||
This change enables Neuron SDK to trace the model and optimize it to run in Inf1 instances.
|
||||
|
||||
To learn more about AWS Neuron SDK features, tools, example tutorials and latest updates,
|
||||
To learn more about AWS Neuron SDK features, tools, example tutorials and latest updates,
|
||||
please see the [AWS NeuronSDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html).
|
||||
|
||||
Reference in New Issue
Block a user