From b105f2c6b3986e7170be353b1684861a3f70991b Mon Sep 17 00:00:00 2001
From: Morgan Funtowicz <funtowiczmo@gmail.com>
Date: Fri, 21 Aug 2020 10:37:09 +0200
Subject: [PATCH] Update ONNX doc to match the removal of --optimize argument.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
---
 docs/source/serialization.rst | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/docs/source/serialization.rst b/docs/source/serialization.rst
index 5026d2b7a0..7cde5830ee 100644
--- a/docs/source/serialization.rst
+++ b/docs/source/serialization.rst
@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu
 * Skip connection LayerNormalization fusing
 * FastGeLU approximation
 
+Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances
+if used on another machine with a different hardware configuration than the one used for exporting the model.
+For this reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled,
+ensuring the model can be easily exported to various hardware.
+Optimizations can then be enabled when loading the model through ONNX runtime for inference.
 
-Fortunately, you can let ONNXRuntime find all the possible optimized operators for you. Simply add ``--optimize``
-when exporting your model through ``convert_graph_to_onnx.py``.
 
-Example:
-
-.. code-block:: bash
-
-    python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --optimize bert-base-cased.onnx
+.. note::
+    When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the model
+    because quantization would modify the underlying graph making it impossible for ONNX runtime to do the optimizations
+    afterwards.
 
 .. note::
     For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
@@ -112,8 +114,6 @@ Example of quantized BERT model export:
     above command will contain the original ONNX model storing `float32` weights.
     The second one, with ``-quantized`` suffix, will hold the quantized parameters.
 
-.. note::
-    The quantization export gives the best performances when used in combination with ``--optimize``.
 
 TorchScript
 =======================================