Models doc (#7345)

* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
This commit is contained in:
Sylvain Gugger
2020-09-23 13:20:45 -04:00
committed by GitHub
parent 58405a527b
commit 3323146e90
165 changed files with 6907 additions and 5803 deletions

View File

@@ -1,9 +1,9 @@
**********************************************
***********************************************************************************************************************
Exporting transformers models
**********************************************
***********************************************************************************************************************
ONNX / ONNXRuntime
==============================================
=======================================================================================================================
Projects `ONNX (Open Neural Network eXchange) <http://onnx.ai>`_ and `ONNXRuntime (ORT) <https://microsoft.github.io/onnxruntime/>`_ are part of an effort from leading industries in the AI field
to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
@@ -42,7 +42,7 @@ Also, the conversion tool supports different options which let you tune the beha
Optimizations
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph.
Below are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*):
@@ -68,7 +68,7 @@ Optimizations can then be enabled when loading the model through ONNX runtime fo
For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)
Quantization
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
ONNX exporter supports generating a quantized version of the model to allow efficient inference.
@@ -116,7 +116,7 @@ Example of quantized BERT model export:
TorchScript
=======================================
=======================================================================================================================
.. note::
This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
@@ -141,10 +141,10 @@ These necessities imply several things developers should be careful about. These
Implications
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
TorchScript flag and tied weights
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
This flag is necessary because most of the language models in this repository have tied weights between their
``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied weights, therefore
it is necessary to untie and clone the weights beforehand.
@@ -157,7 +157,7 @@ This is not the case for models that do not have a Language Model head, as those
can be safely exported without the ``torchscript`` flag.
Dummy inputs and standard lengths
------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used
@@ -178,12 +178,12 @@ It is recommended to be careful of the total number of operations done on each i
when exporting varying sequence-length models.
Using TorchScript in Python
-------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------
Below is an example, showing how to save, load models as well as how to use the trace for inference.
Saving a model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
@@ -229,7 +229,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename ``
torch.jit.save(traced_model, "traced_bert.pt")
Loading a model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
We are re-using the previously initialised ``dummy_input``.
@@ -242,7 +242,7 @@ We are re-using the previously initialised ``dummy_input``.
all_encoder_layers, pooled_output = loaded_model(*dummy_input)
Using a traced model for inference
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Using the traced model for inference is as simple as using its ``__call__`` dunder method: