ONNX documentation (#5992)

* Move torchscript and add ONNX documentation under modle_export

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid link

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve spelling

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Final wording pass

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
This commit is contained in:
Funtowicz Morgan
2020-07-29 11:02:35 +02:00
committed by GitHub
parent 92f8ce2ed6
commit 640550fc7a
2 changed files with 46 additions and 7 deletions

View File

@@ -157,8 +157,8 @@ conversion utilities for the following models:
notebooks notebooks
converting_tensorflow_models converting_tensorflow_models
migration migration
torchscript
contributing contributing
serialization
.. toctree:: .. toctree::
:maxdepth: 2 :maxdepth: 2

View File

@@ -1,5 +1,44 @@
**********************************************
Exporting transformers models
**********************************************
ONNX / ONNXRuntime
==============================================
Projects ONNX (Open Neural Network eXchange) and ONNXRuntime (ORT) are part of an effort from leading industries in the AI field
to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
of hardware and dedicated optimizations.
Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to
the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines using
Hugging Face Transformers and ONNX Runtime <https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources.
The following command shows how easy it is to export a BERT model from the library, simply run:
.. code-block:: bash
python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx
The conversion tool works for both PyTorch and Tensorflow models and ensures:
* The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint.
* The inputs and outputs are correctly generated to their ONNX counterpart.
* The generated model can be correctly loaded through onnxruntime.
.. note::
Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations
on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please
open up an issue on transformers.
Also, the conversion tool supports different options which let you tune the behavior of the generated model:
* Change the target opset version of the generated model: More recent opset generally supports more operator and enables faster inference.
* Export pipeline specific prediction heads: Allow to export model along with its task-specific prediction head(s).
* Use the external data format (PyTorch only): Lets you export model which size is above 2Gb (`More info <https://github.com/pytorch/pytorch/pull/33062>`_).
TorchScript TorchScript
================================================ =======================================
.. note:: .. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
@@ -25,7 +64,7 @@ These necessities imply several things developers should be careful about. These
Implications Implications
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ------------------------------------------------
TorchScript flag and tied weights TorchScript flag and tied weights
------------------------------------------------ ------------------------------------------------
@@ -62,12 +101,12 @@ It is recommended to be careful of the total number of operations done on each i
when exporting varying sequence-length models. when exporting varying sequence-length models.
Using TorchScript in Python Using TorchScript in Python
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -------------------------------------------------
Below are examples of using the Python to save, load models as well as how to use the trace for inference. Below are examples of using the Python to save, load models as well as how to use the trace for inference.
Saving a model Saving a model
------------------------------------------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt`` according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
@@ -113,7 +152,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename ``
torch.jit.save(traced_model, "traced_bert.pt") torch.jit.save(traced_model, "traced_bert.pt")
Loading a model Loading a model
------------------------------------------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``. This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
We are re-using the previously initialised ``dummy_input``. We are re-using the previously initialised ``dummy_input``.
@@ -126,7 +165,7 @@ We are re-using the previously initialised ``dummy_input``.
all_encoder_layers, pooled_output = loaded_model(dummy_input) all_encoder_layers, pooled_output = loaded_model(dummy_input)
Using a traced model for inference Using a traced model for inference
------------------------------------------------ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using the traced model for inference is as simple as using its ``__call__`` dunder method: Using the traced model for inference is as simple as using its ``__call__`` dunder method: