Doc styling (#8067)
* Important files * Styling them all * Revert "Styling them all" This reverts commit 7d029395fdae8513b8281cbc2a6c239f8093503e. * Syling them for realsies * Fix syntax error * Fix benchmark_utils * More fixes * Fix modeling auto and script * Remove new line * Fixes * More fixes * Fix more files * Style * Add FSMT * More fixes * More fixes * More fixes * More fixes * Fixes * More fixes * More fixes * Last fixes * Make sphinx happy
This commit is contained in:
@@ -12,25 +12,25 @@ data.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for
|
||||
a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred
|
||||
*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a
|
||||
wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred
|
||||
languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly
|
||||
outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy
|
||||
on XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on
|
||||
low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model.
|
||||
We also present a detailed empirical evaluation of the key factors that are required to achieve these gains,
|
||||
including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and
|
||||
low resource languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling
|
||||
without sacrificing per-language performance; XLM-Ris very competitive with strong monolingual models on the GLUE
|
||||
and XNLI benchmarks. We will make XLM-R code, data, and models publicly available.*
|
||||
outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on
|
||||
XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on
|
||||
low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We
|
||||
also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the
|
||||
trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource
|
||||
languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing
|
||||
per-language performance; XLM-Ris very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We
|
||||
will make XLM-R code, data, and models publicly available.*
|
||||
|
||||
Tips:
|
||||
|
||||
- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
|
||||
not require :obj:`lang` tensors to understand which language is used, and should be able to determine the correct
|
||||
language from the input ids.
|
||||
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage
|
||||
examples as well as the information relative to the inputs and outputs.
|
||||
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
||||
as well as the information relative to the inputs and outputs.
|
||||
|
||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`__.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user