Add support for XLM-R XL and XXL models by modeling_xlm_roberta_xl.py (#13727)
* add xlm roberta xl * add convert xlm xl fairseq checkpoint to pytorch * fix init and documents for xlm-roberta-xl * fix indention * add test for XLM-R xl,xxl * fix model hub name * fix some stuff * up * correct init * fix more * fix as suggestions * add torch_device * fix default values of doc strings * fix leftovers * merge to master * up * correct hub names * fix docs * fix model * up * finalize * last fix * Apply suggestions from code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * add copied from * make style Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
@@ -312,6 +312,8 @@
|
||||
title: XLM-ProphetNet
|
||||
- local: model_doc/xlm-roberta
|
||||
title: XLM-RoBERTa
|
||||
- local: model_doc/xlm-roberta-xl
|
||||
title: XLM-RoBERTa-XL
|
||||
- local: model_doc/xlnet
|
||||
title: XLNet
|
||||
- local: model_doc/xlsr_wav2vec2
|
||||
|
||||
@@ -146,6 +146,7 @@ conversion utilities for the following models.
|
||||
1. **[XLM](model_doc/xlm)** (from Facebook) released together with the paper [Cross-lingual Language Model Pretraining](https://arxiv.org/abs/1901.07291) by Guillaume Lample and Alexis Conneau.
|
||||
1. **[XLM-ProphetNet](model_doc/xlm-prophetnet)** (from Microsoft Research) released with the paper [ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training](https://arxiv.org/abs/2001.04063) by Yu Yan, Weizhen Qi, Yeyun Gong, Dayiheng Liu, Nan Duan, Jiusheng Chen, Ruofei Zhang and Ming Zhou.
|
||||
1. **[XLM-RoBERTa](model_doc/xlm-roberta)** (from Facebook AI), released together with the paper [Unsupervised Cross-lingual Representation Learning at Scale](https://arxiv.org/abs/1911.02116) by Alexis Conneau*, Kartikay Khandelwal*, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer and Veselin Stoyanov.
|
||||
1. **[XLM-RoBERTa-XL](model_doc/xlm-roberta-xl)** (from Facebook AI), released together with the paper [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
|
||||
1. **[XLNet](model_doc/xlnet)** (from Google/CMU) released with the paper [XLNet: Generalized Autoregressive Pretraining for Language Understanding](https://arxiv.org/abs/1906.08237) by Zhilin Yang*, Zihang Dai*, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le.
|
||||
1. **[XLSR-Wav2Vec2](model_doc/xlsr_wav2vec2)** (from Facebook AI) released with the paper [Unsupervised Cross-Lingual Representation Learning For Speech Recognition](https://arxiv.org/abs/2006.13979) by Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli.
|
||||
1. **[XLS-R](https://huggingface.co/docs/master/transformers/model_doc/xls_r)** (from Facebook AI) released with the paper [XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale](https://arxiv.org/abs/2111.09296) by Arun Babu, Changhan Wang, Andros Tjandra, Kushal Lakhotia, Qiantong Xu, Naman Goyal, Kritika Singh, Patrick von Platen, Yatharth Saraf, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli.
|
||||
@@ -246,6 +247,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
| XGLM | ✅ | ✅ | ✅ | ❌ | ✅ |
|
||||
| XLM | ✅ | ❌ | ✅ | ✅ | ❌ |
|
||||
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| XLM-RoBERTa-XL | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
| XLMProphetNet | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| XLNet | ✅ | ✅ | ✅ | ✅ | ❌ |
|
||||
| YOSO | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
|
||||
69
docs/source/model_doc/xlm-roberta-xl.mdx
Normal file
69
docs/source/model_doc/xlm-roberta-xl.mdx
Normal file
@@ -0,0 +1,69 @@
|
||||
<!--Copyright 2022 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# XLM-RoBERTa-XL
|
||||
|
||||
## Overview
|
||||
|
||||
The XLM-RoBERTa-XL model was proposed in [Larger-Scale Transformers for Multilingual Masked Language Modeling](https://arxiv.org/abs/2105.00572) by Naman Goyal, Jingfei Du, Myle Ott, Giri Anantharaman, Alexis Conneau.
|
||||
|
||||
The abstract from the paper is the following:
|
||||
|
||||
*Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while handling 99 more languages. This suggests pretrained models with larger capacity may obtain both strong performance on high-resource languages while greatly improving low-resource languages. We make our code and models publicly available.*
|
||||
|
||||
Tips:
|
||||
|
||||
- XLM-RoBERTa-XL is a multilingual model trained on 100 different languages. Unlike some XLM multilingual models, it does
|
||||
not require `lang` tensors to understand which language is used, and should be able to determine the correct
|
||||
language from the input ids.
|
||||
|
||||
This model was contributed by [Soonhwan-Kwon](https://github.com/Soonhwan-Kwon) and [stefan-it](https://huggingface.co/stefan-it). The original code can be found [here](https://github.com/pytorch/fairseq/tree/master/examples/xlmr).
|
||||
|
||||
|
||||
## XLMRobertaXLConfig
|
||||
|
||||
[[autodoc]] XLMRobertaXLConfig
|
||||
|
||||
## XLMRobertaXLModel
|
||||
|
||||
[[autodoc]] XLMRobertaXLModel
|
||||
- forward
|
||||
|
||||
## XLMRobertaXLForCausalLM
|
||||
|
||||
[[autodoc]] XLMRobertaXLForCausalLM
|
||||
- forward
|
||||
|
||||
## XLMRobertaXLForMaskedLM
|
||||
|
||||
[[autodoc]] XLMRobertaXLForMaskedLM
|
||||
- forward
|
||||
|
||||
## XLMRobertaXLForSequenceClassification
|
||||
|
||||
[[autodoc]] XLMRobertaXLForSequenceClassification
|
||||
- forward
|
||||
|
||||
## XLMRobertaXLForMultipleChoice
|
||||
|
||||
[[autodoc]] XLMRobertaXLForMultipleChoice
|
||||
- forward
|
||||
|
||||
## XLMRobertaXLForTokenClassification
|
||||
|
||||
[[autodoc]] XLMRobertaXLForTokenClassification
|
||||
- forward
|
||||
|
||||
## XLMRobertaXLForQuestionAnswering
|
||||
|
||||
[[autodoc]] XLMRobertaXLForQuestionAnswering
|
||||
- forward
|
||||
@@ -60,6 +60,7 @@ Ready-made configurations include the following architectures:
|
||||
- RoBERTa
|
||||
- T5
|
||||
- XLM-RoBERTa
|
||||
- XLM-RoBERTa-XL
|
||||
|
||||
The ONNX conversion is supported for the PyTorch versions of the models. If you
|
||||
would like to be able to convert a TensorFlow model, please let us know by
|
||||
|
||||
Reference in New Issue
Block a user