Map model_type and doc pages names (#14944)
* Map model_type and doc pages names * Add script * Fix typo * Quality * Manual check for Auto Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
This commit is contained in:
126
docs/source/model_doc/xlm-roberta.mdx
Normal file
126
docs/source/model_doc/xlm-roberta.mdx
Normal file
@@ -0,0 +1,126 @@
|
||||
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# XLM-RoBERTa
|
||||
|
||||
## Overview
|
||||
|
||||
The XLM-RoBERTa model was proposed in [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume
|
||||
Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov. It is based on Facebook's
|
||||
RoBERTa model released in 2019. It is a large multi-lingual language model, trained on 2.5TB of filtered CommonCrawl
|
||||
data.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a
|
||||
wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred
|
||||
languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly
|
||||
outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +13.8% average accuracy on
|
||||
XNLI, +12.3% average F1 score on MLQA, and +2.1% average F1 score on NER. XLM-R performs particularly well on
|
||||
low-resource languages, improving 11.8% in XNLI accuracy for Swahili and 9.2% for Urdu over the previous XLM model. We
|
||||
also present a detailed empirical evaluation of the key factors that are required to achieve these gains, including the
|
||||
trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource
|
||||
languages at scale. Finally, we show, for the first time, the possibility of multilingual modeling without sacrificing
|
||||
per-language performance; XLM-Ris very competitive with strong monolingual models on the GLUE and XNLI benchmarks. We
|
||||
will make XLM-R code, data, and models publicly available.*
|
||||
|
||||
Tips:
|
||||
|
||||
- XLM-RoBERTa is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
|
||||
not require `lang` tensors to understand which language is used, and should be able to determine the correct
|
||||
language from the input ids.
|
||||
- This implementation is the same as RoBERTa. Refer to the [documentation of RoBERTa](roberta) for usage examples
|
||||
as well as the information relative to the inputs and outputs.
|
||||
|
||||
This model was contributed by [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
|
||||
|
||||
|
||||
## XLMRobertaConfig
|
||||
|
||||
[[autodoc]] XLMRobertaConfig
|
||||
|
||||
## XLMRobertaTokenizer
|
||||
|
||||
[[autodoc]] XLMRobertaTokenizer
|
||||
- build_inputs_with_special_tokens
|
||||
- get_special_tokens_mask
|
||||
- create_token_type_ids_from_sequences
|
||||
- save_vocabulary
|
||||
|
||||
## XLMRobertaTokenizerFast
|
||||
|
||||
[[autodoc]] XLMRobertaTokenizerFast
|
||||
|
||||
## XLMRobertaModel
|
||||
|
||||
[[autodoc]] XLMRobertaModel
|
||||
- forward
|
||||
|
||||
## XLMRobertaForCausalLM
|
||||
|
||||
[[autodoc]] XLMRobertaForCausalLM
|
||||
- forward
|
||||
|
||||
## XLMRobertaForMaskedLM
|
||||
|
||||
[[autodoc]] XLMRobertaForMaskedLM
|
||||
- forward
|
||||
|
||||
## XLMRobertaForSequenceClassification
|
||||
|
||||
[[autodoc]] XLMRobertaForSequenceClassification
|
||||
- forward
|
||||
|
||||
## XLMRobertaForMultipleChoice
|
||||
|
||||
[[autodoc]] XLMRobertaForMultipleChoice
|
||||
- forward
|
||||
|
||||
## XLMRobertaForTokenClassification
|
||||
|
||||
[[autodoc]] XLMRobertaForTokenClassification
|
||||
- forward
|
||||
|
||||
## XLMRobertaForQuestionAnswering
|
||||
|
||||
[[autodoc]] XLMRobertaForQuestionAnswering
|
||||
- forward
|
||||
|
||||
## TFXLMRobertaModel
|
||||
|
||||
[[autodoc]] TFXLMRobertaModel
|
||||
- call
|
||||
|
||||
## TFXLMRobertaForMaskedLM
|
||||
|
||||
[[autodoc]] TFXLMRobertaForMaskedLM
|
||||
- call
|
||||
|
||||
## TFXLMRobertaForSequenceClassification
|
||||
|
||||
[[autodoc]] TFXLMRobertaForSequenceClassification
|
||||
- call
|
||||
|
||||
## TFXLMRobertaForMultipleChoice
|
||||
|
||||
[[autodoc]] TFXLMRobertaForMultipleChoice
|
||||
- call
|
||||
|
||||
## TFXLMRobertaForTokenClassification
|
||||
|
||||
[[autodoc]] TFXLMRobertaForTokenClassification
|
||||
- call
|
||||
|
||||
## TFXLMRobertaForQuestionAnswering
|
||||
|
||||
[[autodoc]] TFXLMRobertaForQuestionAnswering
|
||||
- call
|
||||
Reference in New Issue
Block a user