From 640550fc7a1e311915ead1bcca6dacea0c503faf Mon Sep 17 00:00:00 2001
From: Funtowicz Morgan <mfuntowicz@users.noreply.github.com>
Date: Wed, 29 Jul 2020 11:02:35 +0200
Subject: [PATCH] ONNX documentation (#5992)

* Move torchscript and add ONNX documentation under modle_export

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Let's follow guidelines by the gurus: Renamed torchscript.rst to serialization.rst

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* Remove previously introduced tree element

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* WIP doc

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>

* ONNX documentation

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Fix invalid link

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Improve spelling

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>

* Final wording pass

Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
---
 docs/source/index.rst                         |  2 +-
 .../{torchscript.rst => serialization.rst}    | 51 ++++++++++++++++---
 2 files changed, 46 insertions(+), 7 deletions(-)
 rename docs/source/{torchscript.rst => serialization.rst} (69%)

diff --git a/docs/source/index.rst b/docs/source/index.rst
index c5eb3283b0..a9e27953ca 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -157,8 +157,8 @@ conversion utilities for the following models:
     notebooks
     converting_tensorflow_models
     migration
-    torchscript
     contributing
+    serialization
 
 .. toctree::
     :maxdepth: 2
diff --git a/docs/source/torchscript.rst b/docs/source/serialization.rst
similarity index 69%
rename from docs/source/torchscript.rst
rename to docs/source/serialization.rst
index a735b531d1..82180def77 100644
--- a/docs/source/torchscript.rst
+++ b/docs/source/serialization.rst
@@ -1,5 +1,44 @@
+**********************************************
+Exporting transformers models
+**********************************************
+
+ONNX / ONNXRuntime
+==============================================
+
+Projects ONNX (Open Neural Network eXchange) and ONNXRuntime (ORT) are part of an effort from leading industries in the AI field
+to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
+of hardware and dedicated optimizations.
+
+Starting from transformers v2.10.0 we partnered with ONNX Runtime to provide an easy export of transformers models to
+the ONNX format. You can have a look at the effort by looking at our joint blog post `Accelerate your NLP pipelines using
+Hugging Face Transformers and ONNX Runtime <https://medium.com/microsoftazure/accelerate-your-nlp-pipelines-using-hugging-face-transformers-and-onnx-runtime-2443578f4333>`_.
+
+Exporting a model is done through the script `convert_graph_to_onnx.py` at the root of the transformers sources.
+The following command shows how easy it is to export a BERT model from the library, simply run:
+
+.. code-block:: bash
+
+    python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased bert-base-cased.onnx
+
+The conversion tool works for both PyTorch and Tensorflow models and ensures:
+    * The model and its weights are correctly initialized from the Hugging Face model hub or a local checkpoint.
+    * The inputs and outputs are correctly generated to their ONNX counterpart.
+    * The generated model can be correctly loaded through onnxruntime.
+
+.. note::
+    Currently, inputs and outputs are always exported with dynamic sequence axes preventing some optimizations
+    on the ONNX Runtime. If you would like to see such support for fixed-length inputs/outputs, please
+    open up an issue on transformers.
+
+
+Also, the conversion tool supports different options which let you tune the behavior of the generated model:
+    * Change the target opset version of the generated model: More recent opset generally supports more operator and enables faster inference.
+    * Export pipeline specific prediction heads: Allow to export model along with its task-specific prediction head(s).
+    * Use the external data format (PyTorch only): Lets you export model which size is above 2Gb (`More info <https://github.com/pytorch/pytorch/pull/33062>`_).
+
+
 TorchScript
-================================================
+=======================================
 
 .. note::
     This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
@@ -25,7 +64,7 @@ These necessities imply several things developers should be careful about. These
 
 
 Implications
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+------------------------------------------------
 
 TorchScript flag and tied weights
 ------------------------------------------------
@@ -62,12 +101,12 @@ It is recommended to be careful of the total number of operations done on each i
 when exporting varying sequence-length models.
 
 Using TorchScript in Python
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+-------------------------------------------------
 
 Below are examples of using the Python to save, load models as well as how to use the trace for inference.
 
 Saving a model
-------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
 according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
@@ -113,7 +152,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename ``
     torch.jit.save(traced_model, "traced_bert.pt")
 
 Loading a model
-------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
 We are re-using the previously initialised ``dummy_input``.
@@ -126,7 +165,7 @@ We are re-using the previously initialised ``dummy_input``.
     all_encoder_layers, pooled_output = loaded_model(dummy_input)
 
 Using a traced model for inference
-------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Using the traced model for inference is as simple as using its ``__call__`` dunder method: