From b105f2c6b3986e7170be353b1684861a3f70991b Mon Sep 17 00:00:00 2001 From: Morgan Funtowicz Date: Fri, 21 Aug 2020 10:37:09 +0200 Subject: [PATCH] Update ONNX doc to match the removal of --optimize argument. Signed-off-by: Morgan Funtowicz --- docs/source/serialization.rst | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/docs/source/serialization.rst b/docs/source/serialization.rst index 5026d2b7a0..7cde5830ee 100644 --- a/docs/source/serialization.rst +++ b/docs/source/serialization.rst @@ -52,15 +52,17 @@ Below are some of the operators which can be enabled to speed up inference throu * Skip connection LayerNormalization fusing * FastGeLU approximation +Some of the optimizations performed by ONNX runtime can be hardware specific and thus lead to different performances +if used on another machine with a different hardware configuration than the one used for exporting the model. +For this reason, when using ``convert_graph_to_onnx.py`` optimizations are not enabled, +ensuring the model can be easily exported to various hardware. +Optimizations can then be enabled when loading the model through ONNX runtime for inference. -Fortunately, you can let ONNXRuntime find all the possible optimized operators for you. Simply add ``--optimize`` -when exporting your model through ``convert_graph_to_onnx.py``. -Example: - -.. code-block:: bash - - python convert_graph_to_onnx.py --framework --model bert-base-cased --optimize bert-base-cased.onnx +.. note:: + When quantization is enabled (see below), ``convert_graph_to_onnx.py`` script will enable optimizations on the model + because quantization would modify the underlying graph making it impossible for ONNX runtime to do the optimizations + afterwards. .. note:: For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github `_) @@ -112,8 +114,6 @@ Example of quantized BERT model export: above command will contain the original ONNX model storing `float32` weights. The second one, with ``-quantized`` suffix, will hold the quantized parameters. -.. note:: - The quantization export gives the best performances when used in combination with ``--optimize``. TorchScript =======================================