Update ONNX doc to match the removal of --optimize argument.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-21 10:37:09 +02:00
parent e5f452275b
commit b105f2c6b3
1 changed files with 9 additions and 9 deletions
--- a/docs/source/serialization.rst
+++ b/docs/source/serialization.rst
@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu
 * Skip connection LayerNormalization fusing
 * FastGeLU approximation

+Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances
+if used on another machine with a different hardware configuration than the one used for exporting the model.
+For this reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled,
+ensuring the model can be easily exported to various hardware.
+Optimizations can then be enabled when loading the model through ONNX runtime for inference.

-Fortunately, you can let ONNXRuntime find all the possible optimized operators for you. Simply add ``--optimize``
-when exporting your model through ``convert_graph_to_onnx.py``.

-Example:
-
-.. code-block:: bash
-
-    python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --optimize bert-base-cased.onnx
+.. note::
+    When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the model
+    because quantization would modify the underlying graph making it impossible for ONNX runtime to do the optimizations
+    afterwards.

 .. note::
    For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
@@ -112,8 +114,6 @@ Example of quantized BERT model export:
    above command will contain the original ONNX model storing `float32` weights.
    The second one, with ``-quantized`` suffix, will hold the quantized parameters.

-.. note::
-    The quantization export gives the best performances when used in combination with ``--optimize``.

 TorchScript
 =======================================