[Docs] Fix spelling and grammar mistakes (#28825)
* Fix typos and grammar mistakes in docs and examples * Fix typos in docstrings and comments * Fix spelling of `tokenizer` in model tests * Remove erroneous spaces in decorators * Remove extra spaces in Markdown link texts
This commit is contained in:
@@ -143,7 +143,7 @@ Byte-Pair Encoding (BPE) was introduced in [Neural Machine Translation of Rare W
|
||||
al., 2015)](https://arxiv.org/abs/1508.07909). BPE relies on a pre-tokenizer that splits the training data into
|
||||
words. Pretokenization can be as simple as space tokenization, e.g. [GPT-2](model_doc/gpt2), [RoBERTa](model_doc/roberta). More advanced pre-tokenization include rule-based tokenization, e.g. [XLM](model_doc/xlm),
|
||||
[FlauBERT](model_doc/flaubert) which uses Moses for most languages, or [GPT](model_doc/gpt) which uses
|
||||
Spacy and ftfy, to count the frequency of each word in the training corpus.
|
||||
spaCy and ftfy, to count the frequency of each word in the training corpus.
|
||||
|
||||
After pre-tokenization, a set of unique words has been created and the frequency with which each word occurred in the
|
||||
training data has been determined. Next, BPE creates a base vocabulary consisting of all symbols that occur in the set
|
||||
|
||||
Reference in New Issue
Block a user