Minor documentation revisions from copyediting (#9266)

* typo: Revise "checkout" to "check out"

* typo: Change "seemlessly" to "seamlessly"

* typo: Close parentheses in "Using the tokenizer"

* typo: Add closing parenthesis to supported models aside

* docs: Treat ``position_ids`` as plural

Alternatively, the word "argument" could be added to make the subject singular.

* docs: Remove comma, making subordinate clause

* docs: Remove comma separating verb and direct object

* docs: Fix typo ("next" -> "text")

* docs: Reverse phrase order to simplify sentence

* docs: "quicktour" -> "quick tour"

* docs: "to throw" -> "from throwing"

* docs: Remove disruptive newline in padding/truncation section

* docs: "show exemplary" -> "show examples of"

* docs: "much harder as" -> "much harder than"

* docs: Fix typo "seach" -> "search"

* docs: Fix subject-verb disagreement in WordPiece description

* docs: Fix style in preprocessing.rst
This commit is contained in:
Connor Brinton
2020-12-23 10:15:49 -05:00
committed by GitHub
parent d5db6c37d4
commit bcc87c639f
8 changed files with 19 additions and 20 deletions

View File

@@ -16,7 +16,7 @@ Summary of the models
This is a summary of the models available in 🤗 Transformers. It assumes youre familiar with the original `transformer
model <https://arxiv.org/abs/1706.03762>`_. For a gentle introduction check the `annotated transformer
<http://nlp.seas.harvard.edu/2018/04/03/attention.html>`_. Here we focus on the high-level differences between the
models. You can check them more in detail in their respective documentation. Also checkout the :doc:`pretrained model
models. You can check them more in detail in their respective documentation. Also check out the :doc:`pretrained model
page </pretrained_models>` to see the checkpoints available for each type of model and all `the community models
<https://huggingface.co/models>`_.
@@ -30,7 +30,7 @@ Each one of the models in the library falls into one of the following categories
Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
sentence so that the attention heads can only see what was before in the next, and not whats after. Although those
sentence so that the attention heads can only see what was before in the text, and not whats after. Although those
models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation. A
typical example of such models is GPT.
@@ -512,8 +512,8 @@ BART
<https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al.
Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder
, on the pretraining tasks, a composition of the following transformations are applied:
fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). A composition of
the following transformations are applied on the pretraining tasks for the encoder:
* mask random tokens (like in BERT)
* delete random tokens