Models doc (#7345)

* Clean up model documentation * Formatting * Preparation work * Long lines * Main work on rst files * Cleanup all config files * Syntax fix * Clean all tokenizers * Work on first models * Models beginning * FaluBERT * All PyTorch models * All models * Long lines again * Fixes * More fixes * Update docs/source/model_doc/bert.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/electra.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Last fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-23 13:20:45 -04:00
parent 58405a527b
commit 3323146e90
165 changed files with 6907 additions and 5803 deletions
--- a/docs/source/serialization.rst
+++ b/docs/source/serialization.rst
@@ -1,9 +1,9 @@
-**********************************************
+***********************************************************************************************************************
 Exporting transformers models
-**********************************************
+***********************************************************************************************************************

 ONNX / ONNXRuntime
-==============================================
+=======================================================================================================================

 Projects `ONNX (Open Neural Network eXchange) <http://onnx.ai>`_ and `ONNXRuntime (ORT) <https://microsoft.github.io/onnxruntime/>`_ are part of an effort from leading industries in the AI field
 to provide a unified and community-driven format to store and, by extension, efficiently execute neural network leveraging a variety
@@ -42,7 +42,7 @@ Also, the conversion tool supports different options which let you tune the beha


 Optimizations
------------------------------------------------
+-----------------------------------------------------------------------------------------------------------------------

 ONNXRuntime includes some transformers-specific transformations to leverage optimized operations in the graph.
 Below are some of the operators which can be enabled to speed up inference through ONNXRuntime (*see note below*):
@@ -68,7 +68,7 @@ Optimizations can then be enabled when loading the model through ONNX runtime fo
    For more information about the optimizations enabled by ONNXRuntime, please have a look at the (`ONNXRuntime Github <https://github.com/microsoft/onnxruntime/tree/master/onnxruntime/python/tools/transformers>`_)

 Quantization
------------------------------------------------
+-----------------------------------------------------------------------------------------------------------------------

 ONNX exporter supports generating a quantized version of the model to allow efficient inference.

@@ -116,7 +116,7 @@ Example of quantized BERT model export:


 TorchScript
-=======================================
+=======================================================================================================================

 .. note::
    This is the very beginning of our experiments with TorchScript and we are still exploring its capabilities
@@ -141,10 +141,10 @@ These necessities imply several things developers should be careful about. These


 Implications
------------------------------------------------
+-----------------------------------------------------------------------------------------------------------------------

 TorchScript flag and tied weights
------------------------------------------------
+-----------------------------------------------------------------------------------------------------------------------
 This flag is necessary because most of the language models in this repository have tied weights between their
 ``Embedding`` layer and their ``Decoding`` layer. TorchScript does not allow the export of models that have tied weights, therefore
 it is necessary to untie and clone the weights beforehand.
@@ -157,7 +157,7 @@ This is not the case for models that do not have a Language Model head, as those
 can be safely exported without the ``torchscript`` flag.

 Dummy inputs and standard lengths
------------------------------------------------
+-----------------------------------------------------------------------------------------------------------------------

 The dummy inputs are used to do a model forward pass. While the inputs' values are propagating through the layers,
 Pytorch keeps track of the different operations executed on each tensor. These recorded operations are then used
@@ -178,12 +178,12 @@ It is recommended to be careful of the total number of operations done on each i
 when exporting varying sequence-length models.

 Using TorchScript in Python
-------------------------------------------------
+-----------------------------------------------------------------------------------------------------------------------

 Below is an example, showing how to save, load models as well as how to use the trace for inference.

 Saving a model
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 This snippet shows how to use TorchScript to export a ``BertModel``. Here the ``BertModel`` is instantiated
 according to a ``BertConfig`` class and then saved to disk under the filename ``traced_bert.pt``
@@ -229,7 +229,7 @@ according to a ``BertConfig`` class and then saved to disk under the filename ``
    torch.jit.save(traced_model, "traced_bert.pt")

 Loading a model
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 This snippet shows how to load the ``BertModel`` that was previously saved to disk under the name ``traced_bert.pt``.
 We are re-using the previously initialised ``dummy_input``.
@@ -242,7 +242,7 @@ We are re-using the previously initialised ``dummy_input``.
    all_encoder_layers, pooled_output = loaded_model(*dummy_input)

 Using a traced model for inference
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Using the traced model for inference is as simple as using its ``__call__`` dunder method: