Models doc (#7345)

* Clean up model documentation * Formatting * Preparation work * Long lines * Main work on rst files * Cleanup all config files * Syntax fix * Clean all tokenizers * Work on first models * Models beginning * FaluBERT * All PyTorch models * All models * Long lines again * Fixes * More fixes * Update docs/source/model_doc/bert.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update docs/source/model_doc/electra.rst Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Last fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2020-09-23 13:20:45 -04:00
parent 58405a527b
commit 3323146e90
165 changed files with 6907 additions and 5803 deletions
--- a/docs/source/perplexity.rst
+++ b/docs/source/perplexity.rst
@@ -1,5 +1,5 @@
 Perplexity of fixed-length models
-=================================
+=======================================================================================================================

 Perplexity (PPL) is one of the most common metrics for evaluating language
 models. Before diving in, we should note that the metric applies specifically
@@ -31,7 +31,7 @@ relationship to Bits Per Character (BPC) and data compression, check out this
 <https://thegradient.pub/understanding-evaluation-metrics-for-language-models/>`_.

 Calculating PPL with fixed-length models
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 If we weren't limited by a model's context size, we would evaluate the
 model's perplexity by autoregressively factorizing a sequence and
@@ -83,7 +83,7 @@ time. This allows computation to procede much faster while still giving the
 model a large context to make predictions at each step.

 Example: Calculating perplexity with GPT-2 in 🤗 Transformers
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 Let's demonstrate this process with GPT-2.