From f18c6fa94c498235281c0140ebdafb8f723b232e Mon Sep 17 00:00:00 2001 From: "K.C. Tung" <87498815+kct22aws@users.noreply.github.com> Date: Fri, 7 Jan 2022 01:34:12 -0600 Subject: [PATCH 1/2] Resubmit changes after rebase to master (#14982) --- docs/source/serialization.mdx | 64 +++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) diff --git a/docs/source/serialization.mdx b/docs/source/serialization.mdx index 0d667fc070..091eac083b 100644 --- a/docs/source/serialization.mdx +++ b/docs/source/serialization.mdx @@ -436,3 +436,67 @@ Using the traced model for inference is as simple as using its `__call__` dunder ```python traced_model(tokens_tensor, segments_tensors) ``` + +### Deploying HuggingFace TorchScript models on AWS using the Neuron SDK + +AWS introduced the [Amazon EC2 Inf1](https://aws.amazon.com/ec2/instance-types/inf1/) +instance family for low cost, high performance machine learning inference in the cloud. +The Inf1 instances are powered by the AWS Inferentia chip, a custom-built hardware accelerator, +specializing in deep learning inferencing workloads. +[AWS Neuron](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/#) +is the SDK for Inferentia that supports tracing and optimizing transformers models for +deployment on Inf1. The Neuron SDK provides: + + +1. Easy-to-use API with one line of code change to trace and optimize a TorchScript model for inference in the cloud. +2. Out of the box performance optimizations for [improved cost-performance](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/benchmark/>) +3. Support for HuggingFace transformers models built with either [PyTorch](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/bert_tutorial/tutorial_pretrained_bert.html) + or [TensorFlow](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/tensorflow/huggingface_bert/huggingface_bert.html). + +#### Implications + +Transformers Models based on the [BERT (Bidirectional Encoder Representations from Transformers)](https://huggingface.co/docs/transformers/master/model_doc/bert) +architecture, or its variants such as [distilBERT](https://huggingface.co/docs/transformers/master/model_doc/distilbert) + and [roBERTa](https://huggingface.co/docs/transformers/master/model_doc/roberta) + will run best on Inf1 for non-generative tasks such as Extractive Question Answering, + Sequence Classification, Token Classification. Alternatively, text generation +tasks can be adapted to run on Inf1, according to this [AWS Neuron MarianMT tutorial](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/src/examples/pytorch/transformers-marianmt.html). +More information about models that can be converted out of the box on Inferentia can be +found in the [Model Architecture Fit section of the Neuron documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/models/models-inferentia.html#models-inferentia). + +#### Dependencies + +Using AWS Neuron to convert models requires the following dependencies and environment: + +* A [Neuron SDK environment](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-guide/neuron-frameworks/pytorch-neuron/index.html#installation-guide), + which comes pre-configured on [AWS Deep Learning AMI](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-inferentia-launching.html). + +#### Converting a Model for AWS Neuron + +Using the same script as in [Using TorchScript in Python](https://huggingface.co/docs/transformers/master/en/serialization#using-torchscript-in-python) +to trace a "BertModel", you import `torch.neuron` framework extension to access +the components of the Neuron SDK through a Python API. + +```python +from transformers import BertModel, BertTokenizer, BertConfig +import torch +import torch.neuron +``` +And only modify the tracing line of code + +from: + +```python +torch.jit.trace(model, [tokens_tensor, segments_tensors]) +``` + +to: + +```python +torch.neuron.trace(model, [token_tensor, segments_tensors]) +``` + +This change enables Neuron SDK to trace the model and optimize it to run in Inf1 instances. + +To learn more about AWS Neuron SDK features, tools, example tutorials and latest updates, +please see the [AWS NeuronSDK documentation](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/index.html). From ac224bb0797c1ee6522d814139f3eb0a8947267b Mon Sep 17 00:00:00 2001 From: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Date: Fri, 7 Jan 2022 16:55:59 +0100 Subject: [PATCH 2/2] [Fix doc examples] Add missing from_pretrained (#15044) * fix doc example - ValueError: Parameter config should be an instance of class `PretrainedConfig` * Update src/transformers/models/segformer/modeling_segformer.py Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * update Co-authored-by: ydshieh Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> --- src/transformers/models/segformer/modeling_segformer.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/transformers/models/segformer/modeling_segformer.py b/src/transformers/models/segformer/modeling_segformer.py index afa6d8cde8..1fa4f73266 100755 --- a/src/transformers/models/segformer/modeling_segformer.py +++ b/src/transformers/models/segformer/modeling_segformer.py @@ -490,8 +490,8 @@ class SegformerModel(SegformerPreTrainedModel): >>> from PIL import Image >>> import requests - >>> feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512") - >>> model = SegformerModel("nvidia/segformer-b0-finetuned-ade-512-512") + >>> feature_extractor = SegformerFeatureExtractor.from_pretrained("nvidia/mit-b0") + >>> model = SegformerModel.from_pretrained("nvidia/mit-b0") >>> url = "http://images.cocodataset.org/val2017/000000039769.jpg" >>> image = Image.open(requests.get(url, stream=True).raw)