Update ONNX doc to match the removal of --optimize argument.

Signed-off-by: Morgan Funtowicz <funtowiczmo@gmail.com>
2020-08-21 10:37:09 +02:00
parent e5f452275b
commit b105f2c6b3
1 changed files with 9 additions and 9 deletions
--- a/docs/source/serialization.rst
+++ b/docs/source/serialization.rst
@@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu
 * Skip connection LayerNormalization fusing
 * FastGeLU approximation
 Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances
 if used on another machine with a different hardware configuration than the one used for exporting the model.
 For this reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled,
 ensuring the model can be easily exported to various hardware.
 Optimizations can then be enabled when loading the model through ONNX runtime for inference.
 Fortunately, you can let ONNXRuntime find all the possible optimized operators for you. Simply add ``--optimize``
 when exporting your model through ``convert_graph_to_onnx.py``.
-Example:
+.. note::
-
+    When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the model
-.. code-block:: bash
+    because quantization would modify the underlying graph making it impossible for ONNX runtime to do the optimizations
-
+    afterwards.
    python convert_graph_to_onnx.py --framework <pt, tf> --model bert-base-cased --optimize bert-base-cased.onnx
 .. note::
    For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
@@ -112,8 +114,6 @@ Example of quantized BERT model export:
    above command will contain the original ONNX model storing `float32` weights.
    The second one, with ``-quantized`` suffix, will hold the quantized parameters.
 .. note::
    The quantization export gives the best performances when used in combination with ``--optimize``.
 TorchScript
 =======================================