docs: add xlm-roberta section to multi-lingual section (#4101)

This commit is contained in:
Stefan Schweter
2020-05-01 17:06:58 +02:00
committed by GitHub
parent 18db92dd9a
commit e80be7f1d0

View File

@@ -104,4 +104,16 @@ BERT has two checkpoints that can be used for multi-lingual tasks:
- ``bert-base-multilingual-cased`` (Masked language modeling + Next sentence prediction, 104 languages)
These checkpoints do not require language embeddings at inference time. They should identify the language
used in the context and infer accordingly.
used in the context and infer accordingly.
XLM-RoBERTa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
XLM-RoBERTa was trained on 2.5TB of newly created clean CommonCrawl data in 100 languages. It provides strong
gains over previously released multi-lingual models like mBERT or XLM on downstream taks like classification,
sequence labeling and question answering.
Two XLM-RoBERTa checkpoints can be used for multi-lingual tasks:
- ``xlm-roberta-base`` (Masked language modeling, 100 languages)
- ``xlm-roberta-large`` (Masked language modeling, 100 languages)