From 640550fc7a1e311915ead1bcca6dacea0c503faf Mon Sep 17 00:00:00 2001 From: Funtowicz Morgan Date: Wed, 29 Jul 2020 11:02:35 +0200 Subject: [PATCH] ONNX documentation (#5992) * Move torchscript and add ONNX documentation under modle_export Signed-off-by: Morgan Funtowicz * Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst Signed-off-by: Morgan Funtowicz * Remove previously introduced tree element Signed-off-by: Morgan Funtowicz * WIP doc Signed-off-by: Morgan Funtowicz * ONNX documentation Signed-off-by: Morgan Funtowicz * Fix invalid link Signed-off-by: Morgan Funtowicz * Improve spelling Signed-off-by: Morgan Funtowicz * Final wording pass Signed-off-by: Morgan Funtowicz --- docs/source/index.rst | 2 +- .../{torchscript.rst => serialization.rst} | 51 ++++++++++++++++--- 2 files changed, 46 insertions(+), 7 deletions(-) rename docs/source/{torchscript.rst => serialization.rst} (69%) diff --git a/docs/source/index.rst b/docs/source/index.rst index c5eb3283b0..a9e27953ca 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -157,8 +157,8 @@ conversion utilities for the following models: notebooks converting_tensorflow_models migration - torchscript contributing + serialization .. toctree:: :maxdepth: 2 diff --git a/docs/source/torchscript.rst b/docs/source/serialization.rst similarity index 69% rename from docs/source/torchscript.rst rename to docs/source/serialization.rst index a735b531d1..82180def77 100644 --- a/docs/source/torchscript.rst +++ b/docs/source/serialization.rst @@ -1,5 +1,44 @@ +********************************************** +Exporting transformers models +********************************************** + +ONNX / ONNXRuntime +============================================== + +Projects ONNX (Open Neural Network eXchange) and ONNXRuntime (ORT) are part of an effort from leading industries in the AI field +to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety +of hardware and dedicated optimizations. + +Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to +the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines using +Hugging Face Transformers and ONNX Runtime `_. + +Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources. +The following command shows how easy it is to export a BERT model from the library, simply run: + +.. code-block:: bash + + python convert_graph_to_onnx.py --framework --model bert-base-cased bert-base-cased.onnx + +The conversion tool works for both PyTorch and Tensorflow models and ensures: + * The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint. + * The inputs and outputs are correctly generated to their ONNX counterpart. + * The generated model can be correctly loaded through onnxruntime. + +.. note:: + Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations + on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please + open up an issue on transformers. + + +Also, the conversion tool supports different options which let you tune the behavior of the generated model: + * Change the target opset version of the generated model: More recent opset generally supports more operator and enables faster inference. + * Export pipeline specific prediction heads: Allow to export model along with its task-specific prediction head(s). + * Use the external data format (PyTorch only): Lets you export model which size is above 2Gb (`More info `_). + + TorchScript -================================================ +======================================= .. note:: This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities @@ -25,7 +64,7 @@ These necessities imply several things developers should be careful about. These Implications -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------------------------------ TorchScript flag and tied weights ------------------------------------------------ @@ -62,12 +101,12 @@ It is recommended to be careful of the total number of operations done on each i when exporting varying sequence-length models. Using TorchScript in Python -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------------------------------- Below are examples of using the Python to save, load models as well as how to use the trace for inference. Saving a model ------------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt`` @@ -113,7 +152,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename `` torch.jit.save(traced_model, "traced_bert.pt") Loading a model ------------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``. We are re-using the previously initialised ``dummy_input``. @@ -126,7 +165,7 @@ We are re-using the previously initialised ``dummy_input``. all_encoder_layers, pooled_output = loaded_model(dummy_input) Using a traced model for inference ------------------------------------------------- +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Using the traced model for inference is as simple as using its ``__call__`` dunder method: