Minor documentation revisions from copyediting (#9266)

* typo: Revise "checkout" to "check out" * typo: Change "seemlessly" to "seamlessly" * typo: Close parentheses in "Using the tokenizer" * typo: Add closing parenthesis to supported models aside * docs: Treat ``position_ids`` as plural Alternatively, the word "argument" could be added to make the subject singular. * docs: Remove comma, making subordinate clause * docs: Remove comma separating verb and direct object * docs: Fix typo ("next" -> "text") * docs: Reverse phrase order to simplify sentence * docs: "quicktour" -> "quick tour" * docs: "to throw" -> "from throwing" * docs: Remove disruptive newline in padding/truncation section * docs: "show exemplary" -> "show examples of" * docs: "much harder as" -> "much harder than" * docs: Fix typo "seach" -> "search" * docs: Fix subject-verb disagreement in WordPiece description * docs: Fix style in preprocessing.rst
2020-12-23 10:15:49 -05:00
parent d5db6c37d4
commit bcc87c639f
8 changed files with 19 additions and 20 deletions
--- a/docs/source/preprocessing.rst
+++ b/docs/source/preprocessing.rst
@@ -17,10 +17,10 @@ In this tutorial, we'll explore how to preprocess your data using 🤗 Transform
 call a :doc:`tokenizer <main_classes/tokenizer>`. You can build one using the tokenizer class associated to the model
 you would like to use, or directly with the :class:`~transformers.AutoTokenizer` class.

-As we saw in the :doc:`quicktour </quicktour>`, the tokenizer will first split a given text in words (or part of words,
-punctuation symbols, etc.) usually called `tokens`. Then it will convert those `tokens` into numbers, to be able to
-build a tensor out of them and feed them to the model. It will also add any additional inputs the model might expect to
-work properly.
+As we saw in the :doc:`quick tour </quicktour>`, the tokenizer will first split a given text in words (or part of
+words, punctuation symbols, etc.) usually called `tokens`. Then it will convert those `tokens` into numbers, to be able
+to build a tensor out of them and feed them to the model. It will also add any additional inputs the model might expect
+to work properly.

 .. note::

@@ -131,7 +131,7 @@ ones it should not (because they represent padding in this case).


 Note that if your model does not have a maximum length associated to it, the command above will throw a warning. You
-can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer to throw those kinds of warnings.
+can safely ignore it. You can also pass ``verbose=False`` to stop the tokenizer from throwing those kinds of warnings.

 .. _sentence-pairs:

@@ -216,7 +216,6 @@ Everything you always wanted to know about padding and truncation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 We have seen the commands that will work for most cases (pad your batch to the length of the maximum sentence and
-
 truncate to the maximum length the mode can accept). However, the API supports more strategies if you need them. The
 three arguments you need to know for this are :obj:`padding`, :obj:`truncation` and :obj:`max_length`.