Minor documentation revisions from copyediting (#9266)
* typo: Revise "checkout" to "check out"
* typo: Change "seemlessly" to "seamlessly"
* typo: Close parentheses in "Using the tokenizer"
* typo: Add closing parenthesis to supported models aside
* docs: Treat ``position_ids`` as plural
Alternatively, the word "argument" could be added to make the subject singular.
* docs: Remove comma, making subordinate clause
* docs: Remove comma separating verb and direct object
* docs: Fix typo ("next" -> "text")
* docs: Reverse phrase order to simplify sentence
* docs: "quicktour" -> "quick tour"
* docs: "to throw" -> "from throwing"
* docs: Remove disruptive newline in padding/truncation section
* docs: "show exemplary" -> "show examples of"
* docs: "much harder as" -> "much harder than"
* docs: Fix typo "seach" -> "search"
* docs: Fix subject-verb disagreement in WordPiece description
* docs: Fix style in preprocessing.rst
This commit is contained in:
@@ -158,7 +158,7 @@ Using the tokenizer
|
||||
|
||||
We mentioned the tokenizer is responsible for the preprocessing of your texts. First, it will split a given text in
|
||||
words (or part of words, punctuation symbols, etc.) usually called `tokens`. There are multiple rules that can govern
|
||||
that process (you can learn more about them in the :doc:`tokenizer summary <tokenizer_summary>`, which is why we need
|
||||
that process (you can learn more about them in the :doc:`tokenizer summary <tokenizer_summary>`), which is why we need
|
||||
to instantiate the tokenizer using the name of the model, to make sure we use the same rules as when the model was
|
||||
pretrained.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user