Models doc (#7345)

* Clean up model documentation

* Formatting

* Preparation work

* Long lines

* Main work on rst files

* Cleanup all config files

* Syntax fix

* Clean all tokenizers

* Work on first models

* Models beginning

* FaluBERT

* All PyTorch models

* All models

* Long lines again

* Fixes

* More fixes

* Update docs/source/model_doc/bert.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/electra.rst

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Last fixes

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
This commit is contained in:
Sylvain Gugger
2020-09-23 13:20:45 -04:00
committed by GitHub
parent 58405a527b
commit 3323146e90
165 changed files with 6907 additions and 5803 deletions

View File

@@ -1,5 +1,5 @@
Perplexity of fixed-length models
=================================
=======================================================================================================================
Perplexity (PPL) is one of the most common metrics for evaluating language
models. Before diving in, we should note that the metric applies specifically
@@ -31,7 +31,7 @@ relationship to Bits Per Character (BPC) and data compression, check out this
<https://thegradient.pub/understanding-evaluation-metrics-for-language-models/>`_.
Calculating PPL with fixed-length models
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If we weren't limited by a model's context size, we would evaluate the
model's perplexity by autoregressively factorizing a sequence and
@@ -83,7 +83,7 @@ time. This allows computation to procede much faster while still giving the
model a large context to make predictions at each step.
Example: Calculating perplexity with GPT-2 in 🤗 Transformers
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Let's demonstrate this process with GPT-2.