Honor contributors to models (#11329)
* Honor contributors to models * Fix typo * Address review comments * Add more authors
This commit is contained in:
@@ -43,7 +43,8 @@ Tips:
|
|||||||
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
|
similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same
|
||||||
number of (repeating) layers.
|
number of (repeating) layers.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/ALBERT>`__.
|
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
|
||||||
|
<https://github.com/google-research/ALBERT>`__.
|
||||||
|
|
||||||
AlbertConfig
|
AlbertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -35,7 +35,8 @@ According to the abstract,
|
|||||||
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains
|
||||||
of up to 6 ROUGE.
|
of up to 6 ROUGE.
|
||||||
|
|
||||||
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
|
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
|
||||||
|
<https://github.com/pytorch/fairseq/tree/master/examples/bart>`__.
|
||||||
|
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
|
|||||||
@@ -16,7 +16,7 @@ BARThez
|
|||||||
Overview
|
Overview
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model`
|
The BARThez model was proposed in `BARThez: a Skilled Pretrained French Sequence-to-Sequence Model
|
||||||
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
|
<https://arxiv.org/abs/2010.12321>`__ by Moussa Kamal Eddine, Antoine J.-P. Tixier, Michalis Vazirgiannis on 23 Oct,
|
||||||
2020.
|
2020.
|
||||||
|
|
||||||
@@ -35,7 +35,8 @@ summarization dataset, OrangeSum, that we release with this paper. We also conti
|
|||||||
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
|
pretrained multilingual BART on BARThez's corpus, and we show that the resulting model, which we call mBARTHez,
|
||||||
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
|
provides a significant boost over vanilla BARThez, and is on par with or outperforms CamemBERT and FlauBERT.*
|
||||||
|
|
||||||
The Authors' code can be found `here <https://github.com/moussaKam/BARThez>`__.
|
This model was contributed by `moussakam <https://huggingface.co/moussakam>`__. The Authors' code can be found `here
|
||||||
|
<https://github.com/moussaKam/BARThez>`__.
|
||||||
|
|
||||||
|
|
||||||
Examples
|
Examples
|
||||||
|
|||||||
@@ -42,7 +42,8 @@ Tips:
|
|||||||
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
|
- BERT was trained with the masked language modeling (MLM) and next sentence prediction (NSP) objectives. It is
|
||||||
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
|
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/bert>`__.
|
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||||
|
<https://github.com/google-research/bert>`__.
|
||||||
|
|
||||||
BertConfig
|
BertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -71,6 +71,8 @@ Tips:
|
|||||||
- This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT
|
- This implementation is the same as BERT, except for tokenization method. Refer to the :doc:`documentation of BERT
|
||||||
<bert>` for more usage examples.
|
<bert>` for more usage examples.
|
||||||
|
|
||||||
|
This model was contributed by `cl-tohoku <https://huggingface.co/cl-tohoku>`__.
|
||||||
|
|
||||||
BertJapaneseTokenizer
|
BertJapaneseTokenizer
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|||||||
@@ -79,7 +79,8 @@ Tips:
|
|||||||
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
|
- For summarization, sentence splitting, sentence fusion and translation, no special tokens are required for the input.
|
||||||
Therefore, no EOS token should be added to the end of the input.
|
Therefore, no EOS token should be added to the end of the input.
|
||||||
|
|
||||||
The original code can be found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
|
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||||
|
found `here <https://tfhub.dev/s?module-type=text-generation&subtype=module,placeholder>`__.
|
||||||
|
|
||||||
BertGenerationConfig
|
BertGenerationConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -54,8 +54,8 @@ Example of use:
|
|||||||
>>> # from transformers import TFAutoModel
|
>>> # from transformers import TFAutoModel
|
||||||
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
|
>>> # bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")
|
||||||
|
|
||||||
|
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here
|
||||||
The original code can be found `here <https://github.com/VinAIResearch/BERTweet>`__.
|
<https://github.com/VinAIResearch/BERTweet>`__.
|
||||||
|
|
||||||
BertweetTokenizer
|
BertweetTokenizer
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -50,7 +50,8 @@ Tips:
|
|||||||
- Current implementation supports only **ITC**.
|
- Current implementation supports only **ITC**.
|
||||||
- Current implementation doesn't support **num_random_blocks = 0**
|
- Current implementation doesn't support **num_random_blocks = 0**
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/bigbird>`__.
|
This model was contributed by `vasudevgupta <https://huggingface.co/vasudevgupta>`__. The original code can be found
|
||||||
|
`here <https://github.com/google-research/bigbird>`__.
|
||||||
|
|
||||||
BigBirdConfig
|
BigBirdConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -36,7 +36,8 @@ and code publicly available. Human evaluations show our best models are superior
|
|||||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||||
failure cases of our models.*
|
failure cases of our models.*
|
||||||
|
|
||||||
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
|
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The authors' code can be found `here
|
||||||
|
<https://github.com/facebookresearch/ParlAI>`__ .
|
||||||
|
|
||||||
|
|
||||||
Implementation Notes
|
Implementation Notes
|
||||||
|
|||||||
@@ -39,7 +39,8 @@ and code publicly available. Human evaluations show our best models are superior
|
|||||||
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
dialogue in terms of engagingness and humanness measurements. We then discuss the limitations of this work by analyzing
|
||||||
failure cases of our models.*
|
failure cases of our models.*
|
||||||
|
|
||||||
The authors' code can be found `here <https://github.com/facebookresearch/ParlAI>`__ .
|
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The authors' code can be
|
||||||
|
found `here <https://github.com/facebookresearch/ParlAI>`__ .
|
||||||
|
|
||||||
BlenderbotSmallConfig
|
BlenderbotSmallConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -43,4 +43,5 @@ Tips:
|
|||||||
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
|
that is sadly not open-sourced yet. It would be very useful for the community, if someone tries to implement the
|
||||||
algorithm to make BORT fine-tuning work.
|
algorithm to make BORT fine-tuning work.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/alexa/bort/>`__.
|
This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
|
||||||
|
<https://github.com/alexa/bort/>`__.
|
||||||
|
|||||||
@@ -37,7 +37,8 @@ Tips:
|
|||||||
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
||||||
as well as the information relative to the inputs and outputs.
|
as well as the information relative to the inputs and outputs.
|
||||||
|
|
||||||
The original code can be found `here <https://camembert-model.fr/>`__.
|
This model was contributed by `camembert <https://huggingface.co/camembert>`__. The original code can be found `here
|
||||||
|
<https://camembert-model.fr/>`__.
|
||||||
|
|
||||||
CamembertConfig
|
CamembertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -34,8 +34,10 @@ ConvBERT significantly outperforms BERT and its variants in various downstream t
|
|||||||
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
|
fewer model parameters. Remarkably, ConvBERTbase model achieves 86.4 GLUE score, 0.7 higher than ELECTRAbase, while
|
||||||
using less than 1/4 training cost. Code and pre-trained models will be released.*
|
using less than 1/4 training cost. Code and pre-trained models will be released.*
|
||||||
|
|
||||||
ConvBERT training tips are similar to those of BERT. The original implementation can be found here:
|
ConvBERT training tips are similar to those of BERT.
|
||||||
https://github.com/yitu-opensource/ConvBert
|
|
||||||
|
This model was contributed by `abhishek <https://huggingface.co/abhishek>`__. The original implementation can be found
|
||||||
|
here: https://github.com/yitu-opensource/ConvBert
|
||||||
|
|
||||||
ConvBertConfig
|
ConvBertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -33,7 +33,8 @@ language model, which could facilitate several downstream Chinese NLP tasks, suc
|
|||||||
cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many
|
cloze test, and language understanding. Extensive experiments demonstrate that CPM achieves strong performance on many
|
||||||
NLP tasks in the settings of few-shot (even zero-shot) learning.*
|
NLP tasks in the settings of few-shot (even zero-shot) learning.*
|
||||||
|
|
||||||
The original implementation can be found here: https://github.com/TsinghuaAI/CPM-Generate
|
This model was contributed by `canwenxu <https://huggingface.co/canwenxu>`__. The original implementation can be found
|
||||||
|
here: https://github.com/TsinghuaAI/CPM-Generate
|
||||||
|
|
||||||
Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.
|
Note: We only have a tokenizer here, since the model architecture is the same as GPT-2.
|
||||||
|
|
||||||
|
|||||||
@@ -46,7 +46,8 @@ Tips:
|
|||||||
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
|
`reusing the past in generative models <../quickstart.html#using-the-past>`__ for more information on the usage of
|
||||||
this argument.
|
this argument.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/salesforce/ctrl>`__.
|
This model was contributed by `keskarnitishr <https://huggingface.co/keskarnitishr>`__. The original code can be found
|
||||||
|
`here <https://github.com/salesforce/ctrl>`__.
|
||||||
|
|
||||||
|
|
||||||
CTRLConfig
|
CTRLConfig
|
||||||
|
|||||||
@@ -38,7 +38,8 @@ the training data performs consistently better on a wide range of NLP tasks, ach
|
|||||||
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
pre-trained models will be made publicly available at https://github.com/microsoft/DeBERTa.*
|
||||||
|
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
|
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
|
||||||
|
<https://github.com/microsoft/DeBERTa>`__.
|
||||||
|
|
||||||
|
|
||||||
DebertaConfig
|
DebertaConfig
|
||||||
|
|||||||
@@ -58,7 +58,8 @@ New in v2:
|
|||||||
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
|
- **900M model & 1.5B model** Two additional model sizes are available: 900M and 1.5B, which significantly improves the
|
||||||
performance of downstream tasks.
|
performance of downstream tasks.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/microsoft/DeBERTa>`__.
|
This model was contributed by `DeBERTa <https://huggingface.co/DeBERTa>`__. The original code can be found `here
|
||||||
|
<https://github.com/microsoft/DeBERTa>`__.
|
||||||
|
|
||||||
|
|
||||||
DebertaV2Config
|
DebertaV2Config
|
||||||
|
|||||||
@@ -73,6 +73,8 @@ Tips:
|
|||||||
`facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to
|
`facebook/deit-base-patch16-384`. Note that one should use :class:`~transformers.DeiTFeatureExtractor` in order to
|
||||||
prepare images for the model.
|
prepare images for the model.
|
||||||
|
|
||||||
|
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__.
|
||||||
|
|
||||||
|
|
||||||
DeiTConfig
|
DeiTConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -44,7 +44,7 @@ Tips:
|
|||||||
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
|
- DistilBERT doesn't have options to select the input positions (:obj:`position_ids` input). This could be added if
|
||||||
necessary though, just let us know if you need this option.
|
necessary though, just let us know if you need this option.
|
||||||
|
|
||||||
The original code can be found `here
|
This model was contributed by `victorsanh <https://huggingface.co/victorsanh>`__. The original code can be found `here
|
||||||
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
|
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -30,7 +30,8 @@ our dense retriever outperforms a strong Lucene-BM25 system largely by 9%-19% ab
|
|||||||
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
|
retrieval accuracy, and helps our end-to-end QA system establish new state-of-the-art on multiple open-domain QA
|
||||||
benchmarks.*
|
benchmarks.*
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/facebookresearch/DPR>`__.
|
This model was contributed by `lhoestq <https://huggingface.co/lhoestq>`__. The original code can be found `here
|
||||||
|
<https://github.com/facebookresearch/DPR>`__.
|
||||||
|
|
||||||
|
|
||||||
DPRConfig
|
DPRConfig
|
||||||
|
|||||||
@@ -54,7 +54,8 @@ Tips:
|
|||||||
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
|
:class:`~transformers.ElectraForPreTraining` model (the classification head will be randomly initialized as it
|
||||||
doesn't exist in the generator).
|
doesn't exist in the generator).
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/electra>`__.
|
This model was contributed by `lysandre <https://huggingface.co/lysandre>`__. The original code can be found `here
|
||||||
|
<https://github.com/google-research/electra>`__.
|
||||||
|
|
||||||
|
|
||||||
ElectraConfig
|
ElectraConfig
|
||||||
|
|||||||
@@ -35,7 +35,8 @@ time they outperform other pretraining approaches. Different versions of FlauBER
|
|||||||
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
|
protocol for the downstream tasks, called FLUE (French Language Understanding Evaluation), are shared to the research
|
||||||
community for further reproducible experiments in French NLP.*
|
community for further reproducible experiments in French NLP.*
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/getalp/Flaubert>`__.
|
This model was contributed by `formiel <https://huggingface.co/formiel>`__. The original code can be found `here
|
||||||
|
<https://github.com/getalp/Flaubert>`__.
|
||||||
|
|
||||||
|
|
||||||
FlaubertConfig
|
FlaubertConfig
|
||||||
|
|||||||
@@ -34,7 +34,8 @@ data, then decode using noisy channel model reranking. Our submissions are ranke
|
|||||||
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
|
human evaluation campaign. On En->De, our system significantly outperforms other systems as well as human translations.
|
||||||
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
|
This system improves upon our WMT'18 submission by 4.5 BLEU points.*
|
||||||
|
|
||||||
The original code can be found here <https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
|
This model was contributed by `stas <https://huggingface.co/stas>`__. The original code can be found here
|
||||||
|
<https://github.com/pytorch/fairseq/tree/master/examples/wmt19>__.
|
||||||
|
|
||||||
Implementation Notes
|
Implementation Notes
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -49,7 +49,8 @@ Tips:
|
|||||||
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
|
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
|
||||||
:class:`~transformers.FunnelForMultipleChoice`.
|
:class:`~transformers.FunnelForMultipleChoice`.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`__.
|
This model was contributed by `sgugger <https://huggingface.co/sgugger>`__. The original code can be found `here
|
||||||
|
<https://github.com/laiguokun/Funnel-Transformer>`__.
|
||||||
|
|
||||||
|
|
||||||
FunnelConfig
|
FunnelConfig
|
||||||
|
|||||||
@@ -45,7 +45,8 @@ Tips:
|
|||||||
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
|
`Write With Transformer <https://transformer.huggingface.co/doc/gpt>`__ is a webapp created and hosted by Hugging Face
|
||||||
showcasing the generative capabilities of several models. GPT is one of them.
|
showcasing the generative capabilities of several models. GPT is one of them.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/openai/finetune-transformer-lm>`__.
|
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||||
|
<https://github.com/openai/finetune-transformer-lm>`__.
|
||||||
|
|
||||||
Note:
|
Note:
|
||||||
|
|
||||||
|
|||||||
@@ -45,7 +45,8 @@ Tips:
|
|||||||
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
Hugging Face showcasing the generative capabilities of several models. GPT-2 is one of them and is available in five
|
||||||
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
|
different sizes: small, medium, large, xl and a distilled version of the small checkpoint: `distilgpt-2`.
|
||||||
|
|
||||||
The original code can be found `here <https://openai.com/blog/better-language-models/>`__.
|
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||||
|
<https://openai.com/blog/better-language-models/>`__.
|
||||||
|
|
||||||
|
|
||||||
GPT2Config
|
GPT2Config
|
||||||
|
|||||||
@@ -23,6 +23,8 @@ Black, Stella Biderman, Leo Gao, Phil Wang and Connor Leahy. It is a GPT2 like c
|
|||||||
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
|
The architecture is similar to GPT2 except that GPT Neo uses local attention in every other layer with a window size of
|
||||||
256 tokens.
|
256 tokens.
|
||||||
|
|
||||||
|
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
|
||||||
|
|
||||||
Generation
|
Generation
|
||||||
_______________________________________________________________________________________________________________________
|
_______________________________________________________________________________________________________________________
|
||||||
|
|
||||||
|
|||||||
@@ -56,7 +56,9 @@ Examples of use:
|
|||||||
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
>>> model = AutoModel.from_pretrained("allegro/herbert-klej-cased-v1")
|
||||||
|
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/allegro/HerBERT>`__.
|
This model was contributed by `rmroczkowski <https://huggingface.co/rmroczkowski>`__. The original code can be found
|
||||||
|
`here <https://github.com/allegro/HerBERT>`__.
|
||||||
|
|
||||||
|
|
||||||
HerbertTokenizer
|
HerbertTokenizer
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -36,8 +36,9 @@ the full-precision baseline. Furthermore, our preliminary implementation of I-BE
|
|||||||
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
|
INT8 inference on a T4 GPU system as compared to FP32 inference. The framework has been developed in PyTorch and has
|
||||||
been open-sourced.*
|
been open-sourced.*
|
||||||
|
|
||||||
|
This model was contributed by `kssteven <https://huggingface.co/kssteven>`__. The original code can be found `here
|
||||||
|
<https://github.com/kssteven418/I-BERT>`__.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/kssteven418/I-BERT>`__.
|
|
||||||
|
|
||||||
IBertConfig
|
IBertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -80,7 +80,8 @@ occurs. Those can be obtained using the Python Image Library (PIL) library for e
|
|||||||
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
|
<https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForTokenClassification_on_FUNSD.ipynb>`__.
|
||||||
It includes an inference part, which shows how to use Google's Tesseract on a new document.
|
It includes an inference part, which shows how to use Google's Tesseract on a new document.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
|
This model was contributed by `liminghao1630 <https://huggingface.co/liminghao1630>`__. The original code can be found
|
||||||
|
`here <https://github.com/microsoft/unilm/tree/master/layoutlm>`_.
|
||||||
|
|
||||||
|
|
||||||
LayoutLMConfig
|
LayoutLMConfig
|
||||||
|
|||||||
@@ -53,6 +53,8 @@ Tips:
|
|||||||
- A notebook showing how to fine-tune LED, can be accessed `here
|
- A notebook showing how to fine-tune LED, can be accessed `here
|
||||||
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
|
<https://colab.research.google.com/drive/12LjJazBl7Gam0XBPy_y0CTOJZeZ34c2v?usp=sharing>`__.
|
||||||
|
|
||||||
|
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
|
||||||
|
|
||||||
|
|
||||||
LEDConfig
|
LEDConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -40,7 +40,8 @@ Tips:
|
|||||||
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
|
token belongs to which segment. Just separate your segments with the separation token :obj:`tokenizer.sep_token` (or
|
||||||
:obj:`</s>`).
|
:obj:`</s>`).
|
||||||
|
|
||||||
The Authors' code can be found `here <https://github.com/allenai/longformer>`__.
|
This model was contributed by `beltagy <https://huggingface.co/beltagy>`__. The Authors' code can be found `here
|
||||||
|
<https://github.com/allenai/longformer>`__.
|
||||||
|
|
||||||
Longformer Self Attention
|
Longformer Self Attention
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -52,7 +52,8 @@ Tips:
|
|||||||
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
|
contains self-attention for each respective modality and cross-attention, only the cross attention is returned and
|
||||||
both self attention outputs are disregarded.
|
both self attention outputs are disregarded.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/airsplay/lxmert>`__.
|
This model was contributed by `eltoto1219 <https://huggingface.co/eltoto1219>`__. The original code can be found `here
|
||||||
|
<https://github.com/airsplay/lxmert>`__.
|
||||||
|
|
||||||
|
|
||||||
LxmertConfig
|
LxmertConfig
|
||||||
|
|||||||
@@ -34,6 +34,8 @@ to create high quality models. Our focus on non-English-Centric models brings ga
|
|||||||
translating between non-English directions while performing competitively to the best single systems of WMT. We
|
translating between non-English directions while performing competitively to the best single systems of WMT. We
|
||||||
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
|
open-source our scripts so that others may reproduce the data, evaluation, and final M2M-100 model.*
|
||||||
|
|
||||||
|
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__.
|
||||||
|
|
||||||
|
|
||||||
Training and Generation
|
Training and Generation
|
||||||
_______________________________________________________________________________________________________________________
|
_______________________________________________________________________________________________________________________
|
||||||
|
|||||||
@@ -37,6 +37,7 @@ Implementation Notes
|
|||||||
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
|
- the model starts generating with :obj:`pad_token_id` (which has 0 as a token_embedding) as the prefix (Bart uses
|
||||||
:obj:`<s/>`),
|
:obj:`<s/>`),
|
||||||
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
|
- Code to bulk convert models can be found in ``convert_marian_to_pytorch.py``.
|
||||||
|
- This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__.
|
||||||
|
|
||||||
Naming
|
Naming
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -29,7 +29,8 @@ corpora in many languages using the BART objective. mBART is one of the first me
|
|||||||
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
|
sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only
|
||||||
on the encoder, decoder, or reconstructing parts of the text.
|
on the encoder, decoder, or reconstructing parts of the text.
|
||||||
|
|
||||||
The Authors' code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
|
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The Authors' code can be found `here
|
||||||
|
<https://github.com/pytorch/fairseq/tree/master/examples/mbart>`__
|
||||||
|
|
||||||
Training of MBart
|
Training of MBart
|
||||||
_______________________________________________________________________________________________________________________
|
_______________________________________________________________________________________________________________________
|
||||||
|
|||||||
@@ -77,9 +77,10 @@ The following commands allow you to do the conversion. We assume that the folder
|
|||||||
|
|
||||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
|
python3 $PATH_TO_TRANSFORMERS/models/megatron_bert/convert_megatron_bert_checkpoint.py megatron_bert_345m_v0_1_cased.zip
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
|
This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
|
||||||
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
|
<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
|
||||||
approach using "tensor parallel" and "pipeline parallel" techniques.
|
Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
|
||||||
|
"pipeline parallel" techniques.
|
||||||
|
|
||||||
MegatronBertConfig
|
MegatronBertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -64,7 +64,8 @@ The following command allows you to do the conversion. We assume that the folder
|
|||||||
|
|
||||||
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
|
python3 $PATH_TO_TRANSFORMERS/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py megatron_gpt2_345m_v0_0.zip
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU
|
This model was contributed by `jdemouth <https://huggingface.co/jdemouth>`__. The original code can be found `here
|
||||||
and multi-node implementation of the Megatron Language models. In particular, it contains a hybrid model parallel
|
<https://github.com/NVIDIA/Megatron-LM>`__. That repository contains a multi-GPU and multi-node implementation of the
|
||||||
approach using "tensor parallel" and "pipeline parallel" techniques.
|
Megatron Language models. In particular, it contains a hybrid model parallel approach using "tensor parallel" and
|
||||||
|
"pipeline parallel" techniques.
|
||||||
|
|
||||||
|
|||||||
@@ -44,7 +44,8 @@ Tips:
|
|||||||
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
|
efficient at predicting masked tokens and at NLU in general, but is not optimal for text generation. Models trained
|
||||||
with a causal language modeling (CLM) objective are better in that regard.
|
with a causal language modeling (CLM) objective are better in that regard.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/mobilebert>`__.
|
This model was contributed by `vshampor <https://huggingface.co/vshampor>`__. The original code can be found `here
|
||||||
|
<https://github.com/google-research/mobilebert>`__.
|
||||||
|
|
||||||
MobileBertConfig
|
MobileBertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -28,7 +28,8 @@ multilingual variant of T5 that was pre-trained on a new Common Crawl-based data
|
|||||||
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
|
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
|
||||||
benchmarks. All of the code and model checkpoints*
|
benchmarks. All of the code and model checkpoints*
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/multilingual-t5>`__.
|
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The original code can be
|
||||||
|
found `here <https://github.com/google-research/multilingual-t5>`__.
|
||||||
|
|
||||||
MT5Config
|
MT5Config
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -31,7 +31,8 @@ According to the abstract,
|
|||||||
extractive summary.
|
extractive summary.
|
||||||
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.
|
- Pegasus achieves SOTA summarization performance on all 12 downstream tasks, as measured by ROUGE and human eval.
|
||||||
|
|
||||||
The Authors' code can be found `here <https://github.com/google-research/pegasus>`__.
|
This model was contributed by `sshleifer <https://huggingface.co/sshleifer>`__. The Authors' code can be found `here
|
||||||
|
<https://github.com/google-research/pegasus>`__.
|
||||||
|
|
||||||
|
|
||||||
Checkpoints
|
Checkpoints
|
||||||
|
|||||||
@@ -50,7 +50,7 @@ Example of use:
|
|||||||
>>> # phobert = TFAutoModel.from_pretrained("vinai/phobert-base")
|
>>> # phobert = TFAutoModel.from_pretrained("vinai/phobert-base")
|
||||||
|
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/VinAIResearch/PhoBERT>`__.
|
This model was contributed by `dqnguyen <https://huggingface.co/dqnguyen>`__. The original code can be found `here <https://github.com/VinAIResearch/PhoBERT>`__.
|
||||||
|
|
||||||
PhobertTokenizer
|
PhobertTokenizer
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -43,6 +43,7 @@ outperforming parametric seq2seq models and task-specific retrieve-and-extract a
|
|||||||
tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art
|
tasks, we find that RAG models generate more specific, diverse and factual language than a state-of-the-art
|
||||||
parametric-only seq2seq baseline.*
|
parametric-only seq2seq baseline.*
|
||||||
|
|
||||||
|
This model was contributed by `ola13 <https://huggingface.co/ola13>`__.
|
||||||
|
|
||||||
|
|
||||||
RagConfig
|
RagConfig
|
||||||
|
|||||||
@@ -32,7 +32,8 @@ layers instead of the standard residuals, which allows storing activations only
|
|||||||
N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models
|
N times, where N is the number of layers. The resulting model, the Reformer, performs on par with Transformer models
|
||||||
while being much more memory-efficient and much faster on long sequences.*
|
while being much more memory-efficient and much faster on long sequences.*
|
||||||
|
|
||||||
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`__.
|
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__. The Authors' code can be
|
||||||
|
found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`__.
|
||||||
|
|
||||||
Axial Positional Encodings
|
Axial Positional Encodings
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -20,8 +20,8 @@ The RetriBERT model was proposed in the blog post `Explain Anything Like I'm Fiv
|
|||||||
Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a small model that uses either a single or
|
Question Answering <https://yjernite.github.io/lfqa.html>`__. RetriBERT is a small model that uses either a single or
|
||||||
pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
|
pair of BERT encoders with lower-dimension projection for dense semantic indexing of text.
|
||||||
|
|
||||||
Code to train and use the model can be found `here
|
This model was contributed by `yjernite <https://huggingface.co/yjernite>`__. Code to train and use the model can be
|
||||||
<https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
|
found `here <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__.
|
||||||
|
|
||||||
|
|
||||||
RetriBertConfig
|
RetriBertConfig
|
||||||
|
|||||||
@@ -44,7 +44,8 @@ Tips:
|
|||||||
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`</s>`)
|
separate your segments with the separation token :obj:`tokenizer.sep_token` (or :obj:`</s>`)
|
||||||
- :doc:`CamemBERT <camembert>` is a wrapper around RoBERTa. Refer to this page for usage examples.
|
- :doc:`CamemBERT <camembert>` is a wrapper around RoBERTa. Refer to this page for usage examples.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
|
This model was contributed by `julien-c <https://huggingface.co/julien-c>`__. The original code can be found `here
|
||||||
|
<https://github.com/pytorch/fairseq/tree/master/examples/roberta>`_.
|
||||||
|
|
||||||
|
|
||||||
RobertaConfig
|
RobertaConfig
|
||||||
|
|||||||
@@ -25,7 +25,8 @@ transcripts/translations autoregressively. Speech2Text has been fine-tuned on se
|
|||||||
`LibriSpeech <http://www.openslr.org/12>`__, `CoVoST 2 <https://github.com/facebookresearch/covost>`__, `MuST-C
|
`LibriSpeech <http://www.openslr.org/12>`__, `CoVoST 2 <https://github.com/facebookresearch/covost>`__, `MuST-C
|
||||||
<https://ict.fbk.eu/must-c/>`__.
|
<https://ict.fbk.eu/must-c/>`__.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text>`__.
|
This model was contributed by `valhalla <https://huggingface.co/valhalla>`__. The original code can be found `here
|
||||||
|
<https://github.com/pytorch/fairseq/tree/master/examples/speech_to_text>`__.
|
||||||
|
|
||||||
|
|
||||||
Inference
|
Inference
|
||||||
|
|||||||
@@ -47,6 +47,9 @@ Tips:
|
|||||||
- For best results when finetuning on sequence classification tasks, it is recommended to start with the
|
- For best results when finetuning on sequence classification tasks, it is recommended to start with the
|
||||||
`squeezebert/squeezebert-mnli-headless` checkpoint.
|
`squeezebert/squeezebert-mnli-headless` checkpoint.
|
||||||
|
|
||||||
|
This model was contributed by `forresti <https://huggingface.co/forresti>`__.
|
||||||
|
|
||||||
|
|
||||||
SqueezeBertConfig
|
SqueezeBertConfig
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|||||||
@@ -48,7 +48,8 @@ Tips:
|
|||||||
layers to the decoder and auto-regressively generates the decoder output. - T5 uses relative scalar embeddings.
|
layers to the decoder and auto-regressively generates the decoder output. - T5 uses relative scalar embeddings.
|
||||||
Encoder input padding can be done on the left and on the right.
|
Encoder input padding can be done on the left and on the right.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/text-to-text-transfer-transformer>`__.
|
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||||
|
<https://github.com/google-research/text-to-text-transfer-transformer>`__.
|
||||||
|
|
||||||
Training
|
Training
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -49,7 +49,8 @@ entailment (a binary classification task). For more details, see their follow-up
|
|||||||
intermediate pre-training <https://www.aclweb.org/anthology/2020.findings-emnlp.27/>`__ by Julian Martin Eisenschlos,
|
intermediate pre-training <https://www.aclweb.org/anthology/2020.findings-emnlp.27/>`__ by Julian Martin Eisenschlos,
|
||||||
Syrine Krichene and Thomas Müller.
|
Syrine Krichene and Thomas Müller.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/google-research/tapas>`__.
|
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code can be found `here
|
||||||
|
<https://github.com/google-research/tapas>`__.
|
||||||
|
|
||||||
Tips:
|
Tips:
|
||||||
|
|
||||||
|
|||||||
@@ -41,7 +41,8 @@ Tips:
|
|||||||
original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
|
original implementation trains on SQuAD with padding on the left, therefore the padding defaults are set to left.
|
||||||
- Transformer-XL is one of the few models that has no sequence length limit.
|
- Transformer-XL is one of the few models that has no sequence length limit.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/kimiyoung/transformer-xl>`__.
|
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||||
|
<https://github.com/kimiyoung/transformer-xl>`__.
|
||||||
|
|
||||||
|
|
||||||
TransfoXLConfig
|
TransfoXLConfig
|
||||||
|
|||||||
@@ -67,7 +67,8 @@ Tips:
|
|||||||
improvement of 2% to training from scratch, but still 4% behind supervised pre-training.
|
improvement of 2% to training from scratch, but still 4% behind supervised pre-training.
|
||||||
|
|
||||||
|
|
||||||
The original code (written in JAX) can be found `here <https://github.com/google-research/vision_transformer>`__.
|
This model was contributed by `nielsr <https://huggingface.co/nielsr>`__. The original code (written in JAX) can be
|
||||||
|
found `here <https://github.com/google-research/vision_transformer>`__.
|
||||||
|
|
||||||
Note that we converted the weights from Ross Wightman's `timm library
|
Note that we converted the weights from Ross Wightman's `timm library
|
||||||
<https://github.com/rwightman/pytorch-image-models>`__, who already converted the weights from JAX to PyTorch. Credits
|
<https://github.com/rwightman/pytorch-image-models>`__, who already converted the weights from JAX to PyTorch. Credits
|
||||||
|
|||||||
@@ -36,6 +36,8 @@ Tips:
|
|||||||
- Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded
|
- Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded
|
||||||
using :class:`~transformers.Wav2Vec2CTCTokenizer`.
|
using :class:`~transformers.Wav2Vec2CTCTokenizer`.
|
||||||
|
|
||||||
|
This model was contributed by `patrickvonplaten <https://huggingface.co/patrickvonplaten>`__.
|
||||||
|
|
||||||
|
|
||||||
Wav2Vec2Config
|
Wav2Vec2Config
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|||||||
@@ -42,7 +42,8 @@ Tips:
|
|||||||
- XLM has multilingual checkpoints which leverage a specific :obj:`lang` parameter. Check out the :doc:`multi-lingual
|
- XLM has multilingual checkpoints which leverage a specific :obj:`lang` parameter. Check out the :doc:`multi-lingual
|
||||||
<../multilingual>` page for more information.
|
<../multilingual>` page for more information.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/facebookresearch/XLM/>`__.
|
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||||
|
<https://github.com/facebookresearch/XLM/>`__.
|
||||||
|
|
||||||
|
|
||||||
XLMConfig
|
XLMConfig
|
||||||
|
|||||||
@@ -44,7 +44,8 @@ Tips:
|
|||||||
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
- This implementation is the same as RoBERTa. Refer to the :doc:`documentation of RoBERTa <roberta>` for usage examples
|
||||||
as well as the information relative to the inputs and outputs.
|
as well as the information relative to the inputs and outputs.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`__.
|
This model was contributed by `stefan-it <https://huggingface.co/stefan-it>`__. The original code can be found `here
|
||||||
|
<https://github.com/pytorch/fairseq/tree/master/examples/xlmr>`__.
|
||||||
|
|
||||||
|
|
||||||
XLMRobertaConfig
|
XLMRobertaConfig
|
||||||
|
|||||||
@@ -44,7 +44,8 @@ Tips:
|
|||||||
`examples/text-generation/run_generation.py`)
|
`examples/text-generation/run_generation.py`)
|
||||||
- XLNet is one of the few models that has no sequence length limit.
|
- XLNet is one of the few models that has no sequence length limit.
|
||||||
|
|
||||||
The original code can be found `here <https://github.com/zihangdai/xlnet/>`__.
|
This model was contributed by `thomwolf <https://huggingface.co/thomwolf>`__. The original code can be found `here
|
||||||
|
<https://github.com/zihangdai/xlnet/>`__.
|
||||||
|
|
||||||
|
|
||||||
XLNetConfig
|
XLNetConfig
|
||||||
|
|||||||
@@ -27,6 +27,10 @@ Tips:
|
|||||||
|
|
||||||
<INSERT TIPS ABOUT MODEL HERE>
|
<INSERT TIPS ABOUT MODEL HERE>
|
||||||
|
|
||||||
|
This model was contributed by `<INSERT YOUR HF USERNAME HERE>
|
||||||
|
<https://huggingface.co/<INSERT YOUR HF USERNAME HERE>>`__. The original code can be found `here
|
||||||
|
<<INSERT LINK TO GITHUB REPO HERE>>`__.
|
||||||
|
|
||||||
{{cookiecutter.camelcase_modelname}}Config
|
{{cookiecutter.camelcase_modelname}}Config
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user