Fix many typos (#8708)
This commit is contained in:
@@ -44,7 +44,7 @@ The documentation is organized in five parts:
|
||||
and a glossary.
|
||||
- **USING 🤗 TRANSFORMERS** contains general tutorials on how to use the library.
|
||||
- **ADVANCED GUIDES** contains more advanced guides that are more specific to a given script or part of the library.
|
||||
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general resarch in
|
||||
- **RESEARCH** focuses on tutorials that have less to do with how to use the library but more about general research in
|
||||
transformers model
|
||||
- The three last section contain the documentation of each public class and function, grouped in:
|
||||
|
||||
|
||||
@@ -5,7 +5,7 @@ Overview
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Dense Passage Retrieval (DPR) is a set of tools and models for state-of-the-art open-domain Q&A research. It was
|
||||
intorduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__ by
|
||||
introduced in `Dense Passage Retrieval for Open-Domain Question Answering <https://arxiv.org/abs/2004.04906>`__ by
|
||||
Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, Wen-tau Yih.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
@@ -530,7 +530,7 @@ Sequence-to-sequence model with the same encoder-decoder model architecture as B
|
||||
two self-supervised objective functions: Masked Language Modeling (MLM) and a novel summarization specific pre-training
|
||||
objective, called Gap Sentence Generation (GSG).
|
||||
|
||||
* MLM: encoder input tokens are randomely replaced by a mask tokens and have to be predicted by the encoder (like in
|
||||
* MLM: encoder input tokens are randomly replaced by a mask tokens and have to be predicted by the encoder (like in
|
||||
BERT)
|
||||
* GSG: whole encoder input sentences are replaced by a second mask token and fed to the decoder, but which has a
|
||||
causal mask to hide the future words like a regular auto-regressive transformer decoder.
|
||||
|
||||
@@ -109,7 +109,7 @@ XLM-RoBERTa
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong gains
|
||||
over previously released multi-lingual models like mBERT or XLM on downstream taks like classification, sequence
|
||||
over previously released multi-lingual models like mBERT or XLM on downstream tasks like classification, sequence
|
||||
labeling and question answering.
|
||||
|
||||
Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks:
|
||||
|
||||
@@ -62,7 +62,7 @@ sliding the context window so that the model has more context when making each p
|
||||
This is a closer approximation to the true decomposition of the sequence probability and will typically yield a more
|
||||
favorable score. The downside is that it requires a separate forward pass for each token in the corpus. A good
|
||||
practical compromise is to employ a strided sliding window, moving the context by larger strides rather than sliding by
|
||||
1 token a time. This allows computation to procede much faster while still giving the model a large context to make
|
||||
1 token a time. This allows computation to proceed much faster while still giving the model a large context to make
|
||||
predictions at each step.
|
||||
|
||||
Example: Calculating perplexity with GPT-2 in 🤗 Transformers
|
||||
|
||||
Reference in New Issue
Block a user