Minor documentation revisions from copyediting (#9266)

* typo: Revise "checkout" to "check out" * typo: Change "seemlessly" to "seamlessly" * typo: Close parentheses in "Using the tokenizer" * typo: Add closing parenthesis to supported models aside * docs: Treat ``position_ids`` as plural Alternatively, the word "argument" could be added to make the subject singular. * docs: Remove comma, making subordinate clause * docs: Remove comma separating verb and direct object * docs: Fix typo ("next" -> "text") * docs: Reverse phrase order to simplify sentence * docs: "quicktour" -> "quick tour" * docs: "to throw" -> "from throwing" * docs: Remove disruptive newline in padding/truncation section * docs: "show exemplary" -> "show examples of" * docs: "much harder as" -> "much harder than" * docs: Fix typo "seach" -> "search" * docs: Fix subject-verb disagreement in WordPiece description * docs: Fix style in preprocessing.rst
2020-12-23 10:15:49 -05:00
parent d5db6c37d4
commit bcc87c639f
8 changed files with 19 additions and 20 deletions
--- a/docs/source/model_summary.rst
+++ b/docs/source/model_summary.rst
@@ -16,7 +16,7 @@ Summary of the models
 This is a summary of the models available in 🤗 Transformers. It assumes you’re familiar with the original `transformer
 model <https://arxiv.org/abs/1706.03762>`_. For a gentle introduction check the `annotated transformer
 <http://nlp.seas.harvard.edu/2018/04/03/attention.html>`_. Here we focus on the high-level differences between the
-models. You can check them more in detail in their respective documentation. Also checkout the :doc:`pretrained model
+models. You can check them more in detail in their respective documentation. Also check out the :doc:`pretrained model
 page </pretrained_models>` to see the checkpoints available for each type of model and all `the community models
 <https://huggingface.co/models>`_.

@@ -30,7 +30,7 @@ Each one of the models in the library falls into one of the following categories

 Autoregressive models are pretrained on the classic language modeling task: guess the next token having read all the
 previous ones. They correspond to the decoder of the original transformer model, and a mask is used on top of the full
-sentence so that the attention heads can only see what was before in the next, and not what’s after. Although those
+sentence so that the attention heads can only see what was before in the text, and not what’s after. Although those
 models can be fine-tuned and achieve great results on many tasks, the most natural application is text generation. A
 typical example of such models is GPT.

@@ -512,8 +512,8 @@ BART
 <https://arxiv.org/abs/1910.13461>`_, Mike Lewis et al.

 Sequence-to-sequence model with an encoder and a decoder. Encoder is fed a corrupted version of the tokens, decoder is
-fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). For the encoder
-, on the pretraining tasks, a composition of the following transformations are applied:
+fed the original tokens (but has a mask to hide the future words like a regular transformers decoder). A composition of
+the following transformations are applied on the pretraining tasks for the encoder:

  * mask random tokens (like in BERT)
  * delete random tokens