Funnel transformer (#6908)
* Initial model * Fix upsampling * Add special cls token id and test * Formatting * Test and fist FunnelTokenizerFast * Common tests * Fix the check_repo script and document Funnel * Doc fixes * Add all models * Write doc * Fix test * Initial model * Fix upsampling * Add special cls token id and test * Formatting * Test and fist FunnelTokenizerFast * Common tests * Fix the check_repo script and document Funnel * Doc fixes * Add all models * Write doc * Fix test * Fix copyright * Forgot some layers can be repeated * Apply suggestions from code review Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Update src/transformers/modeling_funnel.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Address review comments * Update src/transformers/modeling_funnel.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Address review comments * Update src/transformers/modeling_funnel.py Co-authored-by: Sam Shleifer <sshleifer@gmail.com> * Slow integration test * Make small integration test * Formatting * Add checkpoint and separate classification head * Formatting * Expand list, fix link and add in pretrained models * Styling * Add the model in all summaries * Typo fixes Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
This commit is contained in:
@@ -173,8 +173,9 @@ Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih.
|
|||||||
23. **[Pegasus](https://github.com/google-research/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
23. **[Pegasus](https://github.com/google-research/pegasus)** (from Google) released with the paper [PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization](https://arxiv.org/abs/1912.08777)> by Jingqing Zhang, Yao Zhao, Mohammad Saleh and Peter J. Liu.
|
||||||
24. **[MBart](https://github.com/pytorch/fairseq/tree/master/examples/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
24. **[MBart](https://github.com/pytorch/fairseq/tree/master/examples/mbart)** (from Facebook) released with the paper [Multilingual Denoising Pre-training for Neural Machine Translation](https://arxiv.org/abs/2001.08210) by Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer.
|
||||||
25. **[LXMERT](https://github.com/airsplay/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
25. **[LXMERT](https://github.com/airsplay/lxmert)** (from UNC Chapel Hill) released with the paper [LXMERT: Learning Cross-Modality Encoder Representations from Transformers for Open-Domain Question Answering](https://arxiv.org/abs/1908.07490) by Hao Tan and Mohit Bansal.
|
||||||
26. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
|
26. **[Funnel Transformer](https://github.com/laiguokun/Funnel-Transformer)** (from CMU/Google Brain) released with the paper [Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing](https://arxiv.org/abs/2006.03236) by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||||
27. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
|
27. **[Other community models](https://huggingface.co/models)**, contributed by the [community](https://huggingface.co/users).
|
||||||
|
28. Want to contribute a new model? We have added a **detailed guide and templates** to guide you in the process of adding a new model. You can find them in the [`templates`](./templates) folder of the repository. Be sure to check the [contributing guidelines](./CONTRIBUTING.md) and contact the maintainers or open an issue to collect feedbacks before starting your PR.
|
||||||
|
|
||||||
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
|
These implementations have been tested on several datasets (see the example scripts) and should match the performances of the original implementations (e.g. ~93 F1 on SQuAD for BERT Whole-Word-Masking, ~88 F1 on RocStories for OpenAI GPT, ~18.3 perplexity on WikiText 103 for Transformer-XL, ~0.916 Pearson R coefficient on STS-B for XLNet). You can find more details on the performances in the Examples section of the [documentation](https://huggingface.co/transformers/examples.html).
|
||||||
|
|
||||||
|
|||||||
@@ -131,7 +131,10 @@ conversion utilities for the following models:
|
|||||||
25. `LXMERT <https://github.com/airsplay/lxmert>`_ (from UNC Chapel Hill) released with the paper `LXMERT: Learning
|
25. `LXMERT <https://github.com/airsplay/lxmert>`_ (from UNC Chapel Hill) released with the paper `LXMERT: Learning
|
||||||
Cross-Modality Encoder Representations from Transformers for Open-Domain Question
|
Cross-Modality Encoder Representations from Transformers for Open-Domain Question
|
||||||
Answering <https://arxiv.org/abs/1908.07490>`_ by Hao Tan and Mohit Bansal.
|
Answering <https://arxiv.org/abs/1908.07490>`_ by Hao Tan and Mohit Bansal.
|
||||||
26. `Other community models <https://huggingface.co/models>`_, contributed by the `community
|
26. `Funnel Transformer <https://github.com/laiguokun/Funnel-Transformer>`_ (from CMU/Google Brain) released with the paper
|
||||||
|
`Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
|
||||||
|
<https://arxiv.org/abs/2006.03236>`_ by Zihang Dai, Guokun Lai, Yiming Yang, Quoc V. Le.
|
||||||
|
27. `Other community models <https://huggingface.co/models>`_, contributed by the `community
|
||||||
<https://huggingface.co/users>`_.
|
<https://huggingface.co/users>`_.
|
||||||
|
|
||||||
.. toctree::
|
.. toctree::
|
||||||
@@ -216,6 +219,7 @@ conversion utilities for the following models:
|
|||||||
model_doc/dpr
|
model_doc/dpr
|
||||||
model_doc/pegasus
|
model_doc/pegasus
|
||||||
model_doc/mbart
|
model_doc/mbart
|
||||||
|
model_doc/funnel
|
||||||
model_doc/lxmert
|
model_doc/lxmert
|
||||||
internal/modeling_utils
|
internal/modeling_utils
|
||||||
internal/tokenization_utils
|
internal/tokenization_utils
|
||||||
|
|||||||
126
docs/source/model_doc/funnel.rst
Normal file
126
docs/source/model_doc/funnel.rst
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
Funnel Transformer
|
||||||
|
------------------
|
||||||
|
|
||||||
|
Overview
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
The Funnel Transformer model was proposed in the paper
|
||||||
|
`Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
|
||||||
|
<https://arxiv.org/abs/2006.03236>`__.
|
||||||
|
It is a bidirectional transformer model, like BERT, but with a pooling operation after each block of layers, a bit
|
||||||
|
like in traditional convolutional neural networks (CNN) in computer vision.
|
||||||
|
|
||||||
|
The abstract from the paper is the following:
|
||||||
|
|
||||||
|
*With the success of language pretraining, it is highly desirable to develop more efficient architectures of good
|
||||||
|
scalability that can exploit the abundant unlabeled data at a lower cost. To improve the efficiency, we examine the
|
||||||
|
much-overlooked redundancy in maintaining a full-length token-level presentation, especially for tasks that only
|
||||||
|
require a single-vector presentation of the sequence. With this intuition, we propose Funnel-Transformer which
|
||||||
|
gradually compresses the sequence of hidden states to a shorter one and hence reduces the computation cost. More
|
||||||
|
importantly, by re-investing the saved FLOPs from length reduction in constructing a deeper or wider model, we further
|
||||||
|
improve the model capacity. In addition, to perform token-level predictions as required by common pretraining
|
||||||
|
objectives, Funnel-Transformer is able to recover a deep representation for each token from the reduced hidden sequence
|
||||||
|
via a decoder. Empirically, with comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on
|
||||||
|
a wide variety of sequence-level prediction tasks, including text classification, language understanding, and reading
|
||||||
|
comprehension.*
|
||||||
|
|
||||||
|
Tips:
|
||||||
|
|
||||||
|
- Since Funnel Transformer uses pooling, the sequence length of the hidden states changes after each block of layers.
|
||||||
|
The base model therefore has a final sequence length that is a quarter of the original one. This model can be used
|
||||||
|
directly for tasks that just require a sentence summary (like sequence classification or multiple choice). For other
|
||||||
|
tasks, the full model is used; this full model has a decoder that upsamples the final hidden states to the same
|
||||||
|
sequence length as the input.
|
||||||
|
- The Funnel Transformer checkpoints are all available with a full version and a base version. The first ones should
|
||||||
|
be used for :class:`~transformers.FunnelModel`, :class:`~transformers.FunnelForPreTraining`,
|
||||||
|
:class:`~transformers.FunnelForMaskedLM`, :class:`~transformers.FunnelForTokenClassification` and
|
||||||
|
class:`~transformers.FunnelForQuestionAnswering`. The second ones should be used for
|
||||||
|
:class:`~transformers.FunnelBaseModel`, :class:`~transformers.FunnelForSequenceClassification` and
|
||||||
|
:class:`~transformers.FunnelForMultipleChoice`.
|
||||||
|
|
||||||
|
The original code can be found `here <https://github.com/laiguokun/Funnel-Transformer>`_.
|
||||||
|
|
||||||
|
|
||||||
|
FunnelConfig
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelConfig
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelTokenizer
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelTokenizer
|
||||||
|
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||||
|
create_token_type_ids_from_sequences, save_vocabulary
|
||||||
|
|
||||||
|
|
||||||
|
FunnelTokenizerFast
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelTokenizerFast
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
Funnel specific outputs
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.modeling_funnel.FunnelForPreTrainingOutput
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelBaseModel
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelBaseModel
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelModel
|
||||||
|
~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelModel
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelModelForPreTraining
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelForPreTraining
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelForMaskedLM
|
||||||
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelForMaskedLM
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelForSequenceClassification
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelForSequenceClassification
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelForMultipleChoice
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelForMultipleChoice
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelForTokenClassification
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelForTokenClassification
|
||||||
|
:members:
|
||||||
|
|
||||||
|
|
||||||
|
FunnelForQuestionAnswering
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. autoclass:: transformers.FunnelForQuestionAnswering
|
||||||
|
:members:
|
||||||
@@ -416,6 +416,38 @@ traditional GAN setting) then the ELECTRA model is trained for a few steps.
|
|||||||
The library provides a version of the model for masked language modeling, token classification and sentence
|
The library provides a version of the model for masked language modeling, token classification and sentence
|
||||||
classification.
|
classification.
|
||||||
|
|
||||||
|
Funnel Transformer
|
||||||
|
----------------------------------------------
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<a href="https://huggingface.co/models?filter=funnel">
|
||||||
|
<img alt="Models" src="https://img.shields.io/badge/All_model_pages-funnel-blueviolet">
|
||||||
|
</a>
|
||||||
|
<a href="model_doc/funnel.html">
|
||||||
|
<img alt="Doc" src="https://img.shields.io/badge/Model_documentation-funnel-blueviolet">
|
||||||
|
</a>
|
||||||
|
|
||||||
|
`Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
|
||||||
|
<https://arxiv.org/abs/2006.03236>`_, Zihang Dai et al.
|
||||||
|
|
||||||
|
Funnel Transformer is a transformer model using pooling, a bit like a ResNet model: layers are grouped in blocks, and
|
||||||
|
at the beginning of each block (except the first one), the hidden states are pooled among the sequence dimension. This
|
||||||
|
way, their length is divided by 2, which speeds up the computation of the next hidden states. All pretrained models
|
||||||
|
have three blocks, which means the final hidden state has a sequence length that is one fourth of the original sequence
|
||||||
|
length.
|
||||||
|
|
||||||
|
For tasks such as classification, this is not a problem, but for tasks like masked language modeling or token
|
||||||
|
classification, we need a hidden state with the same sequence length as the original input. In those cases, the final
|
||||||
|
hidden states are upsampled to the input sequence length and go through two additional layers. That's why there are two
|
||||||
|
versions of each checkpoint. The version suffixed with "-base" contains only the three blocks, while the version
|
||||||
|
without that suffix contains the three blocks and the upsampling head with its additional layers.
|
||||||
|
|
||||||
|
The pretrained models available use the same pretraining objective as ELECTRA.
|
||||||
|
|
||||||
|
The library provides a version of the model for masked language modeling, token classification, sentence
|
||||||
|
classification, multiple choice classification and question answering.
|
||||||
|
|
||||||
.. _longformer:
|
.. _longformer:
|
||||||
|
|
||||||
Longformer
|
Longformer
|
||||||
|
|||||||
@@ -5,366 +5,406 @@ Here is the full list of the currently provided pretrained models together with
|
|||||||
|
|
||||||
For a list that includes community-uploaded models, refer to `https://huggingface.co/models <https://huggingface.co/models>`__.
|
For a list that includes community-uploaded models, refer to `https://huggingface.co/models <https://huggingface.co/models>`__.
|
||||||
|
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Architecture | Shortcut name | Details of the model |
|
| Architecture | Shortcut name | Details of the model |
|
||||||
+===================+============================================================+=======================================================================================================================================+
|
+====================+============================================================+=======================================================================================================================================+
|
||||||
| BERT | ``bert-base-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| BERT | ``bert-base-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on lower-cased English text. |
|
| | | | Trained on lower-cased English text. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-uncased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-uncased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | Trained on lower-cased English text. |
|
| | | | Trained on lower-cased English text. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased English text. |
|
| | | | Trained on cased English text. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | Trained on cased English text. |
|
| | | | Trained on cased English text. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias |
|
| | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased text in the top 104 languages with the largest Wikipedias |
|
| | | | Trained on cased text in the top 104 languages with the largest Wikipedias |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased Chinese Simplified and Traditional text. |
|
| | | | Trained on cased Chinese Simplified and Traditional text. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased German text by Deepset.ai |
|
| | | | Trained on cased German text by Deepset.ai |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`__). |
|
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | Trained on lower-cased English text using Whole-Word-Masking |
|
| | | | Trained on lower-cased English text using Whole-Word-Masking |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | Trained on cased English text using Whole-Word-Masking |
|
| | | | Trained on cased English text using Whole-Word-Masking |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
|
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
|
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||||
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
|
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
|
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-german-dbmdz-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-german-dbmdz-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased German text by DBMDZ |
|
| | | | Trained on cased German text by DBMDZ |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-german-dbmdz-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-german-dbmdz-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on uncased German text by DBMDZ |
|
| | | | Trained on uncased German text by DBMDZ |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, |
|
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, |
|
||||||
| | | | `fugashi <https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__. |
|
| | | | `fugashi <https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__. |
|
||||||
| | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. |
|
| | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, |
|
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece and this requires some extra dependencies, |
|
||||||
| | | | `fugashi <https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__. |
|
| | | | `fugashi <https://github.com/polm/fugashi>`__ which is a wrapper around `MeCab <https://taku910.github.io/mecab/>`__. |
|
||||||
| | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. |
|
| | | | Use ``pip install transformers["ja"]`` (or ``pip install -e .["ja"]`` if you install from source) to install them. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text. Text is tokenized into characters. |
|
| | | | Trained on Japanese text. Text is tokenized into characters. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. |
|
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased Finnish text. |
|
| | | | Trained on cased Finnish text. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``TurkuNLP/bert-base-finnish-uncased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``TurkuNLP/bert-base-finnish-uncased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on uncased Finnish text. |
|
| | | | Trained on uncased Finnish text. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``wietsedv/bert-base-dutch-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``wietsedv/bert-base-dutch-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased Dutch text. |
|
| | | | Trained on cased Dutch text. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details on wietsedv repository <https://github.com/wietsedv/bertje/>`__). |
|
| | | (see `details on wietsedv repository <https://github.com/wietsedv/bertje/>`__). |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | OpenAI GPT English model |
|
| | | | OpenAI GPT English model |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| GPT-2 | ``gpt2`` | | 12-layer, 768-hidden, 12-heads, 117M parameters. |
|
| GPT-2 | ``gpt2`` | | 12-layer, 768-hidden, 12-heads, 117M parameters. |
|
||||||
| | | | OpenAI GPT-2 English model |
|
| | | | OpenAI GPT-2 English model |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``gpt2-medium`` | | 24-layer, 1024-hidden, 16-heads, 345M parameters. |
|
| | ``gpt2-medium`` | | 24-layer, 1024-hidden, 16-heads, 345M parameters. |
|
||||||
| | | | OpenAI's Medium-sized GPT-2 English model |
|
| | | | OpenAI's Medium-sized GPT-2 English model |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``gpt2-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters. |
|
| | ``gpt2-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters. |
|
||||||
| | | | OpenAI's Large-sized GPT-2 English model |
|
| | | | OpenAI's Large-sized GPT-2 English model |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``gpt2-xl`` | | 48-layer, 1600-hidden, 25-heads, 1558M parameters. |
|
| | ``gpt2-xl`` | | 48-layer, 1600-hidden, 25-heads, 1558M parameters. |
|
||||||
| | | | OpenAI's XL-sized GPT-2 English model |
|
| | | | OpenAI's XL-sized GPT-2 English model |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Transformer-XL | ``transfo-xl-wt103`` | | 18-layer, 1024-hidden, 16-heads, 257M parameters. |
|
| Transformer-XL | ``transfo-xl-wt103`` | | 18-layer, 1024-hidden, 16-heads, 257M parameters. |
|
||||||
| | | | English model trained on wikitext-103 |
|
| | | | English model trained on wikitext-103 |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| XLNet | ``xlnet-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| XLNet | ``xlnet-base-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | XLNet English model |
|
| | | | XLNet English model |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlnet-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``xlnet-large-cased`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | XLNet Large English model |
|
| | | | XLNet Large English model |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| XLM | ``xlm-mlm-en-2048`` | | 12-layer, 2048-hidden, 16-heads |
|
| XLM | ``xlm-mlm-en-2048`` | | 12-layer, 2048-hidden, 16-heads |
|
||||||
| | | | XLM English model |
|
| | | | XLM English model |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-mlm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
| | ``xlm-mlm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
||||||
| | | | XLM English-German model trained on the concatenation of English and German wikipedia |
|
| | | | XLM English-German model trained on the concatenation of English and German wikipedia |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-mlm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
| | ``xlm-mlm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
||||||
| | | | XLM English-French model trained on the concatenation of English and French wikipedia |
|
| | | | XLM English-French model trained on the concatenation of English and French wikipedia |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-mlm-enro-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
| | ``xlm-mlm-enro-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
||||||
| | | | XLM English-Romanian Multi-language model |
|
| | | | XLM English-Romanian Multi-language model |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-mlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
|
| | ``xlm-mlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
|
||||||
| | | | XLM Model pre-trained with MLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
|
| | | | XLM Model pre-trained with MLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-mlm-tlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
|
| | ``xlm-mlm-tlm-xnli15-1024`` | | 12-layer, 1024-hidden, 8-heads |
|
||||||
| | | | XLM Model pre-trained with MLM + TLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
|
| | | | XLM Model pre-trained with MLM + TLM on the `15 XNLI languages <https://github.com/facebookresearch/XNLI>`__. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-clm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
| | ``xlm-clm-enfr-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
||||||
| | | | XLM English-French model trained with CLM (Causal Language Modeling) on the concatenation of English and French wikipedia |
|
| | | | XLM English-French model trained with CLM (Causal Language Modeling) on the concatenation of English and French wikipedia |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-clm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
| | ``xlm-clm-ende-1024`` | | 6-layer, 1024-hidden, 8-heads |
|
||||||
| | | | XLM English-German model trained with CLM (Causal Language Modeling) on the concatenation of English and German wikipedia |
|
| | | | XLM English-German model trained with CLM (Causal Language Modeling) on the concatenation of English and German wikipedia |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-mlm-17-1280`` | | 16-layer, 1280-hidden, 16-heads |
|
| | ``xlm-mlm-17-1280`` | | 16-layer, 1280-hidden, 16-heads |
|
||||||
| | | | XLM model trained with MLM (Masked Language Modeling) on 17 languages. |
|
| | | | XLM model trained with MLM (Masked Language Modeling) on 17 languages. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-mlm-100-1280`` | | 16-layer, 1280-hidden, 16-heads |
|
| | ``xlm-mlm-100-1280`` | | 16-layer, 1280-hidden, 16-heads |
|
||||||
| | | | XLM model trained with MLM (Masked Language Modeling) on 100 languages. |
|
| | | | XLM model trained with MLM (Masked Language Modeling) on 100 languages. |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
||||||
| | | | RoBERTa using the BERT-base architecture |
|
| | | | RoBERTa using the BERT-base architecture |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-large`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
| | ``roberta-large`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
||||||
| | | | RoBERTa using the BERT-large architecture |
|
| | | | RoBERTa using the BERT-large architecture |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-large-mnli`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
| | ``roberta-large-mnli`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
||||||
| | | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__. |
|
| | | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilroberta-base`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
| | ``distilroberta-base`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
||||||
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
|
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-base-openai-detector`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
| | ``roberta-base-openai-detector`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
||||||
| | | | ``roberta-base`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
| | | | ``roberta-base`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-large-openai-detector`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
| | ``roberta-large-openai-detector`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
||||||
| | | | ``roberta-large`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
| | | | ``roberta-large`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-cased`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
| | ``distilbert-base-cased`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-cased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
| | ``distilbert-base-cased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint, with an additional question answering layer. |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint, with an additional question answering layer. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
||||||
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
|
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-german-cased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
| | ``distilbert-base-german-cased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||||
| | | | The German DistilBERT model distilled from the German DBMDZ BERT model `bert-base-german-dbmdz-cased` checkpoint. |
|
| | | | The German DistilBERT model distilled from the German DBMDZ BERT model `bert-base-german-dbmdz-cased` checkpoint. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-multilingual-cased`` | | 6-layer, 768-hidden, 12-heads, 134M parameters |
|
| | ``distilbert-base-multilingual-cased`` | | 6-layer, 768-hidden, 12-heads, 134M parameters |
|
||||||
| | | | The multilingual DistilBERT model distilled from the Multilingual BERT model `bert-base-multilingual-cased` checkpoint. |
|
| | | | The multilingual DistilBERT model distilled from the Multilingual BERT model `bert-base-multilingual-cased` checkpoint. |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
|
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
|
||||||
| | | | Salesforce's Large-sized CTRL English model |
|
| | | | Salesforce's Large-sized CTRL English model |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| CamemBERT | ``camembert-base`` | | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
| CamemBERT | ``camembert-base`` | | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
||||||
| | | | CamemBERT using the BERT-base architecture |
|
| | | | CamemBERT using the BERT-base architecture |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
||||||
| | | | ALBERT base model |
|
| | | | ALBERT base model |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
||||||
| | | | ALBERT large model |
|
| | | | ALBERT large model |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
||||||
| | | | ALBERT xlarge model |
|
| | | | ALBERT xlarge model |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
||||||
| | | | ALBERT xxlarge model |
|
| | | | ALBERT xxlarge model |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
||||||
| | | | ALBERT base model with no dropout, additional training data and longer training |
|
| | | | ALBERT base model with no dropout, additional training data and longer training |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
||||||
| | | | ALBERT large model with no dropout, additional training data and longer training |
|
| | | | ALBERT large model with no dropout, additional training data and longer training |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
||||||
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
|
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
||||||
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
|
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
|
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
|
||||||
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``t5-base`` | | ~220M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 12-heads, |
|
| | ``t5-base`` | | ~220M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 12-heads, |
|
||||||
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``t5-large`` | | ~770M parameters with 24-layers, 1024-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
|
| | ``t5-large`` | | ~770M parameters with 24-layers, 1024-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
|
||||||
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``t5-3B`` | | ~2.8B parameters with 24-layers, 1024-hidden-state, 16384 feed-forward hidden-state, 32-heads, |
|
| | ``t5-3B`` | | ~2.8B parameters with 24-layers, 1024-hidden-state, 16384 feed-forward hidden-state, 32-heads, |
|
||||||
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
|
| | ``t5-11B`` | | ~11B parameters with 24-layers, 1024-hidden-state, 65536 feed-forward hidden-state, 128-heads, |
|
||||||
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
| | | | Trained on English text: the Colossal Clean Crawled Corpus (C4) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| XLM-RoBERTa | ``xlm-roberta-base`` | | ~125M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 8-heads, |
|
| XLM-RoBERTa | ``xlm-roberta-base`` | | ~125M parameters with 12-layers, 768-hidden-state, 3072 feed-forward hidden-state, 8-heads, |
|
||||||
| | | | Trained on on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
|
| | | | Trained on on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``xlm-roberta-large`` | | ~355M parameters with 24-layers, 1027-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
|
| | ``xlm-roberta-large`` | | ~355M parameters with 24-layers, 1027-hidden-state, 4096 feed-forward hidden-state, 16-heads, |
|
||||||
| | | | Trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
|
| | | | Trained on 2.5 TB of newly created clean CommonCrawl data in 100 languages |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
|
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
|
||||||
| | | | FlauBERT small architecture |
|
| | | | FlauBERT small architecture |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``flaubert/flaubert_base_uncased`` | | 12-layer, 768-hidden, 12-heads, 137M parameters |
|
| | ``flaubert/flaubert_base_uncased`` | | 12-layer, 768-hidden, 12-heads, 137M parameters |
|
||||||
| | | | FlauBERT base architecture with uncased vocabulary |
|
| | | | FlauBERT base architecture with uncased vocabulary |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``flaubert/flaubert_base_cased`` | | 12-layer, 768-hidden, 12-heads, 138M parameters |
|
| | ``flaubert/flaubert_base_cased`` | | 12-layer, 768-hidden, 12-heads, 138M parameters |
|
||||||
| | | | FlauBERT base architecture with cased vocabulary |
|
| | | | FlauBERT base architecture with cased vocabulary |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``flaubert/flaubert_large_cased`` | | 24-layer, 1024-hidden, 16-heads, 373M parameters |
|
| | ``flaubert/flaubert_large_cased`` | | 24-layer, 1024-hidden, 16-heads, 373M parameters |
|
||||||
| | | | FlauBERT large architecture |
|
| | | | FlauBERT large architecture |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Bart | ``facebook/bart-large`` | | 24-layer, 1024-hidden, 16-heads, 406M parameters |
|
| Bart | ``facebook/bart-large`` | | 24-layer, 1024-hidden, 16-heads, 406M parameters |
|
||||||
| | | |
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``facebook/bart-base`` | | 12-layer, 768-hidden, 16-heads, 139M parameters |
|
| | ``facebook/bart-base`` | | 12-layer, 768-hidden, 16-heads, 139M parameters |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``facebook/bart-large-mnli`` | | Adds a 2 layer classification head with 1 million parameters |
|
| | ``facebook/bart-large-mnli`` | | Adds a 2 layer classification head with 1 million parameters |
|
||||||
| | | | bart-large base architecture with a classification head, finetuned on MNLI |
|
| | | | bart-large base architecture with a classification head, finetuned on MNLI |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``facebook/bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) |
|
| | ``facebook/bart-large-cnn`` | | 12-layer, 1024-hidden, 16-heads, 406M parameters (same as base) |
|
||||||
| | | | bart-large base architecture finetuned on cnn summarization task |
|
| | | | bart-large base architecture finetuned on cnn summarization task |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| DialoGPT | ``DialoGPT-small`` | | 12-layer, 768-hidden, 12-heads, 124M parameters |
|
| DialoGPT | ``DialoGPT-small`` | | 12-layer, 768-hidden, 12-heads, 124M parameters |
|
||||||
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
|
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``DialoGPT-medium`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
| | ``DialoGPT-medium`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
||||||
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
|
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``DialoGPT-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters |
|
| | ``DialoGPT-large`` | | 36-layer, 1280-hidden, 20-heads, 774M parameters |
|
||||||
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
|
| | | | Trained on English text: 147M conversation-like exchanges extracted from Reddit. |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Reformer | ``reformer-enwik8`` | | 12-layer, 1024-hidden, 8-heads, 149M parameters |
|
| Reformer | ``reformer-enwik8`` | | 12-layer, 1024-hidden, 8-heads, 149M parameters |
|
||||||
| | | | Trained on English Wikipedia data - enwik8. |
|
| | | | Trained on English Wikipedia data - enwik8. |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``reformer-crime-and-punishment`` | | 6-layer, 256-hidden, 2-heads, 3M parameters |
|
| | ``reformer-crime-and-punishment`` | | 6-layer, 256-hidden, 2-heads, 3M parameters |
|
||||||
| | | | Trained on English text: Crime and Punishment novel by Fyodor Dostoyevsky. |
|
| | | | Trained on English text: Crime and Punishment novel by Fyodor Dostoyevsky. |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| MarianMT | ``Helsinki-NLP/opus-mt-{src}-{tgt}`` | | 12-layer, 512-hidden, 8-heads, ~74M parameter Machine translation models. Parameter counts vary depending on vocab size. |
|
| MarianMT | ``Helsinki-NLP/opus-mt-{src}-{tgt}`` | | 12-layer, 512-hidden, 8-heads, ~74M parameter Machine translation models. Parameter counts vary depending on vocab size. |
|
||||||
| | | | (see `model list <https://huggingface.co/Helsinki-NLP>`_) |
|
| | | | (see `model list <https://huggingface.co/Helsinki-NLP>`_) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Pegasus | ``google/pegasus-{dataset}`` | | 16-layer, 1024-hidden, 16-heads, ~568M parameter, 2.2 GB for summary. `model list <https://huggingface.co/models?search=pegasus>`__ |
|
| Pegasus | ``google/pegasus-{dataset}`` | | 16-layer, 1024-hidden, 16-heads, ~568M parameter, 2.2 GB for summary. `model list <https://huggingface.co/models?search=pegasus>`__ |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Longformer | ``allenai/longformer-base-4096`` | | 12-layer, 768-hidden, 12-heads, ~149M parameters |
|
| Longformer | ``allenai/longformer-base-4096`` | | 12-layer, 768-hidden, 12-heads, ~149M parameters |
|
||||||
| | | | Starting from RoBERTa-base checkpoint, trained on documents of max length 4,096 |
|
| | | | Starting from RoBERTa-base checkpoint, trained on documents of max length 4,096 |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``allenai/longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters |
|
| | ``allenai/longformer-large-4096`` | | 24-layer, 1024-hidden, 16-heads, ~435M parameters |
|
||||||
| | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 |
|
| | | | Starting from RoBERTa-large checkpoint, trained on documents of max length 4,096 |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| MBart | ``facebook/mbart-large-cc25`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
|
| MBart | ``facebook/mbart-large-cc25`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
|
||||||
| | | | mBART (bart-large architecture) model trained on 25 languages' monolingual corpus |
|
| | | | mBART (bart-large architecture) model trained on 25 languages' monolingual corpus |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``facebook/mbart-large-en-ro`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
|
| | ``facebook/mbart-large-en-ro`` | | 24-layer, 1024-hidden, 16-heads, 610M parameters |
|
||||||
| | | | mbart-large-cc25 model finetuned on WMT english romanian translation. |
|
| | | | mbart-large-cc25 model finetuned on WMT english romanian translation. |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Lxmert | ``lxmert-base-uncased`` | | 9-language layers, 9-relationship layers, and 12-cross-modality layers |
|
| Lxmert | ``lxmert-base-uncased`` | | 9-language layers, 9-relationship layers, and 12-cross-modality layers |
|
||||||
| | | | 768-hidden, 12-heads (for each layer) ~ 228M parameters |
|
| | | | 768-hidden, 12-heads (for each layer) ~ 228M parameters |
|
||||||
| | | | Starting from lxmert-base checkpoint, trained on over 9 million image-text couplets from COCO, VisualGenome, GQA, VQA |
|
| | | | Starting from lxmert-base checkpoint, trained on over 9 million image-text couplets from COCO, VisualGenome, GQA, VQA |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| Funnel Transformer | ``funnel-transformer/small`` | | 14 layers: 3 blocks of 4 layers then 2 layers decoder, 768-hidden, 12-heads, 130M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/small-base`` | | 12 layers: 3 blocks of 4 layers (no decoder), 768-hidden, 12-heads, 115M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/medium`` | | 14 layers: 3 blocks 6, 3x2, 3x2 layers then 2 layers decoder, 768-hidden, 12-heads, 130M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/medium-base`` | | 12 layers: 3 blocks 6, 3x2, 3x2 layers(no decoder), 768-hidden, 12-heads, 115M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/intermediate`` | | 20 layers: 3 blocks of 6 layers then 2 layers decoder, 768-hidden, 12-heads, 177M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/intermediate-base`` | | 18 layers: 3 blocks of 6 layers (no decoder), 768-hidden, 12-heads, 161M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/large`` | | 26 layers: 3 blocks of 8 layers then 2 layers decoder, 1024-hidden, 12-heads, 386M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/large-base`` | | 24 layers: 3 blocks of 8 layers (no decoder), 1024-hidden, 12-heads, 358M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/xlarge`` | | 32 layers: 3 blocks of 10 layers then 2 layers decoder, 1024-hidden, 12-heads, 468M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
| | ``funnel-transformer/xlarge-base`` | | 30 layers: 3 blocks of 10 layers (no decoder), 1024-hidden, 12-heads, 440M parameters |
|
||||||
|
| | | |
|
||||||
|
| | | (see `details <https://github.com/laiguokun/Funnel-Transformer>`__) |
|
||||||
|
+--------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
|
|||||||
@@ -29,6 +29,7 @@ from .configuration_dpr import DPR_PRETRAINED_CONFIG_ARCHIVE_MAP, DPRConfig
|
|||||||
from .configuration_electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig
|
from .configuration_electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig
|
||||||
from .configuration_encoder_decoder import EncoderDecoderConfig
|
from .configuration_encoder_decoder import EncoderDecoderConfig
|
||||||
from .configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig
|
from .configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig
|
||||||
|
from .configuration_funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig
|
||||||
from .configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
|
from .configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
|
||||||
from .configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
|
from .configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
|
||||||
from .configuration_lxmert import LXMERT_PRETRAINED_CONFIG_ARCHIVE_MAP, LxmertConfig
|
from .configuration_lxmert import LXMERT_PRETRAINED_CONFIG_ARCHIVE_MAP, LxmertConfig
|
||||||
@@ -155,6 +156,7 @@ from .tokenization_dpr import (
|
|||||||
)
|
)
|
||||||
from .tokenization_electra import ElectraTokenizer, ElectraTokenizerFast
|
from .tokenization_electra import ElectraTokenizer, ElectraTokenizerFast
|
||||||
from .tokenization_flaubert import FlaubertTokenizer
|
from .tokenization_flaubert import FlaubertTokenizer
|
||||||
|
from .tokenization_funnel import FunnelTokenizer, FunnelTokenizerFast
|
||||||
from .tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
|
from .tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
|
||||||
from .tokenization_longformer import LongformerTokenizer, LongformerTokenizerFast
|
from .tokenization_longformer import LongformerTokenizer, LongformerTokenizerFast
|
||||||
from .tokenization_lxmert import LxmertTokenizer, LxmertTokenizerFast
|
from .tokenization_lxmert import LxmertTokenizer, LxmertTokenizerFast
|
||||||
@@ -327,6 +329,18 @@ if is_torch_available():
|
|||||||
FlaubertModel,
|
FlaubertModel,
|
||||||
FlaubertWithLMHeadModel,
|
FlaubertWithLMHeadModel,
|
||||||
)
|
)
|
||||||
|
from .modeling_funnel import (
|
||||||
|
FUNNEL_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
|
FunnelBaseModel,
|
||||||
|
FunnelForMaskedLM,
|
||||||
|
FunnelForMultipleChoice,
|
||||||
|
FunnelForPreTraining,
|
||||||
|
FunnelForQuestionAnswering,
|
||||||
|
FunnelForSequenceClassification,
|
||||||
|
FunnelForTokenClassification,
|
||||||
|
FunnelModel,
|
||||||
|
load_tf_weights_in_funnel,
|
||||||
|
)
|
||||||
from .modeling_gpt2 import (
|
from .modeling_gpt2 import (
|
||||||
GPT2_PRETRAINED_MODEL_ARCHIVE_LIST,
|
GPT2_PRETRAINED_MODEL_ARCHIVE_LIST,
|
||||||
GPT2DoubleHeadsModel,
|
GPT2DoubleHeadsModel,
|
||||||
|
|||||||
@@ -15,6 +15,12 @@ def convert_command_factory(args: Namespace):
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
|
IMPORT_ERROR_MESSAGE = """transformers can only be used from the commandline to convert TensorFlow models in PyTorch,
|
||||||
|
In that case, it requires TensorFlow to be installed. Please see
|
||||||
|
https://www.tensorflow.org/install/ for installation instructions.
|
||||||
|
"""
|
||||||
|
|
||||||
|
|
||||||
class ConvertCommand(BaseTransformersCLICommand):
|
class ConvertCommand(BaseTransformersCLICommand):
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def register_subcommand(parser: ArgumentParser):
|
def register_subcommand(parser: ArgumentParser):
|
||||||
@@ -69,12 +75,7 @@ class ConvertCommand(BaseTransformersCLICommand):
|
|||||||
convert_tf_checkpoint_to_pytorch,
|
convert_tf_checkpoint_to_pytorch,
|
||||||
)
|
)
|
||||||
except ImportError:
|
except ImportError:
|
||||||
msg = (
|
raise ImportError(IMPORT_ERROR_MESSAGE)
|
||||||
"transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
|
||||||
"https://www.tensorflow.org/install/ for installation instructions."
|
|
||||||
)
|
|
||||||
raise ImportError(msg)
|
|
||||||
|
|
||||||
convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
|
convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
|
||||||
elif self._model_type == "bert":
|
elif self._model_type == "bert":
|
||||||
@@ -83,12 +84,16 @@ class ConvertCommand(BaseTransformersCLICommand):
|
|||||||
convert_tf_checkpoint_to_pytorch,
|
convert_tf_checkpoint_to_pytorch,
|
||||||
)
|
)
|
||||||
except ImportError:
|
except ImportError:
|
||||||
msg = (
|
raise ImportError(IMPORT_ERROR_MESSAGE)
|
||||||
"transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
|
||||||
"https://www.tensorflow.org/install/ for installation instructions."
|
elif self._model_type == "funnel":
|
||||||
|
try:
|
||||||
|
from transformers.convert_funnel_original_tf_checkpoint_to_pytorch import (
|
||||||
|
convert_tf_checkpoint_to_pytorch,
|
||||||
)
|
)
|
||||||
raise ImportError(msg)
|
except ImportError:
|
||||||
|
raise ImportError(IMPORT_ERROR_MESSAGE)
|
||||||
|
|
||||||
convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
|
convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
|
||||||
elif self._model_type == "gpt":
|
elif self._model_type == "gpt":
|
||||||
@@ -103,12 +108,7 @@ class ConvertCommand(BaseTransformersCLICommand):
|
|||||||
convert_transfo_xl_checkpoint_to_pytorch,
|
convert_transfo_xl_checkpoint_to_pytorch,
|
||||||
)
|
)
|
||||||
except ImportError:
|
except ImportError:
|
||||||
msg = (
|
raise ImportError(IMPORT_ERROR_MESSAGE)
|
||||||
"transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
|
||||||
"https://www.tensorflow.org/install/ for installation instructions."
|
|
||||||
)
|
|
||||||
raise ImportError(msg)
|
|
||||||
|
|
||||||
if "ckpt" in self._tf_checkpoint.lower():
|
if "ckpt" in self._tf_checkpoint.lower():
|
||||||
TF_CHECKPOINT = self._tf_checkpoint
|
TF_CHECKPOINT = self._tf_checkpoint
|
||||||
@@ -125,12 +125,7 @@ class ConvertCommand(BaseTransformersCLICommand):
|
|||||||
convert_gpt2_checkpoint_to_pytorch,
|
convert_gpt2_checkpoint_to_pytorch,
|
||||||
)
|
)
|
||||||
except ImportError:
|
except ImportError:
|
||||||
msg = (
|
raise ImportError(IMPORT_ERROR_MESSAGE)
|
||||||
"transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
|
||||||
"https://www.tensorflow.org/install/ for installation instructions."
|
|
||||||
)
|
|
||||||
raise ImportError(msg)
|
|
||||||
|
|
||||||
convert_gpt2_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
|
convert_gpt2_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
|
||||||
elif self._model_type == "xlnet":
|
elif self._model_type == "xlnet":
|
||||||
@@ -139,12 +134,7 @@ class ConvertCommand(BaseTransformersCLICommand):
|
|||||||
convert_xlnet_checkpoint_to_pytorch,
|
convert_xlnet_checkpoint_to_pytorch,
|
||||||
)
|
)
|
||||||
except ImportError:
|
except ImportError:
|
||||||
msg = (
|
raise ImportError(IMPORT_ERROR_MESSAGE)
|
||||||
"transformers can only be used from the commandline to convert TensorFlow models in PyTorch, "
|
|
||||||
"In that case, it requires TensorFlow to be installed. Please see "
|
|
||||||
"https://www.tensorflow.org/install/ for installation instructions."
|
|
||||||
)
|
|
||||||
raise ImportError(msg)
|
|
||||||
|
|
||||||
convert_xlnet_checkpoint_to_pytorch(
|
convert_xlnet_checkpoint_to_pytorch(
|
||||||
self._tf_checkpoint, self._config, self._pytorch_dump_output, self._finetuning_task_name
|
self._tf_checkpoint, self._config, self._pytorch_dump_output, self._finetuning_task_name
|
||||||
|
|||||||
@@ -26,6 +26,7 @@ from .configuration_distilbert import DISTILBERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
|||||||
from .configuration_electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig
|
from .configuration_electra import ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP, ElectraConfig
|
||||||
from .configuration_encoder_decoder import EncoderDecoderConfig
|
from .configuration_encoder_decoder import EncoderDecoderConfig
|
||||||
from .configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig
|
from .configuration_flaubert import FLAUBERT_PRETRAINED_CONFIG_ARCHIVE_MAP, FlaubertConfig
|
||||||
|
from .configuration_funnel import FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP, FunnelConfig
|
||||||
from .configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
|
from .configuration_gpt2 import GPT2_PRETRAINED_CONFIG_ARCHIVE_MAP, GPT2Config
|
||||||
from .configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
|
from .configuration_longformer import LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP, LongformerConfig
|
||||||
from .configuration_lxmert import LXMERT_PRETRAINED_CONFIG_ARCHIVE_MAP, LxmertConfig
|
from .configuration_lxmert import LXMERT_PRETRAINED_CONFIG_ARCHIVE_MAP, LxmertConfig
|
||||||
@@ -67,6 +68,7 @@ ALL_PRETRAINED_CONFIG_ARCHIVE_MAP = dict(
|
|||||||
ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
ELECTRA_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
LONGFORMER_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
RETRIBERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
RETRIBERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
|
FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
LXMERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
LXMERT_PRETRAINED_CONFIG_ARCHIVE_MAP,
|
||||||
]
|
]
|
||||||
for key, value, in pretrained_map.items()
|
for key, value, in pretrained_map.items()
|
||||||
@@ -168,6 +170,10 @@ CONFIG_MAPPING = OrderedDict(
|
|||||||
"encoder-decoder",
|
"encoder-decoder",
|
||||||
EncoderDecoderConfig,
|
EncoderDecoderConfig,
|
||||||
),
|
),
|
||||||
|
(
|
||||||
|
"funnel",
|
||||||
|
FunnelConfig,
|
||||||
|
),
|
||||||
(
|
(
|
||||||
"lxmert",
|
"lxmert",
|
||||||
LxmertConfig,
|
LxmertConfig,
|
||||||
@@ -230,6 +236,7 @@ class AutoConfig:
|
|||||||
- `ctrl` : :class:`~transformers.CTRLConfig` (CTRL model)
|
- `ctrl` : :class:`~transformers.CTRLConfig` (CTRL model)
|
||||||
- `flaubert` : :class:`~transformers.FlaubertConfig` (Flaubert model)
|
- `flaubert` : :class:`~transformers.FlaubertConfig` (Flaubert model)
|
||||||
- `electra` : :class:`~transformers.ElectraConfig` (ELECTRA model)
|
- `electra` : :class:`~transformers.ElectraConfig` (ELECTRA model)
|
||||||
|
- `funnel`: :class:`~transformers.FunnelConfig` (Funnel Transformer model)
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
pretrained_model_name_or_path (:obj:`string`):
|
pretrained_model_name_or_path (:obj:`string`):
|
||||||
|
|||||||
183
src/transformers/configuration_funnel.py
Normal file
183
src/transformers/configuration_funnel.py
Normal file
@@ -0,0 +1,183 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2020, Hugging Face
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
""" Funnel Transformer model configuration """
|
||||||
|
|
||||||
|
from .configuration_utils import PretrainedConfig
|
||||||
|
from .utils import logging
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.get_logger(__name__)
|
||||||
|
|
||||||
|
FUNNEL_PRETRAINED_CONFIG_ARCHIVE_MAP = {
|
||||||
|
"funnel-transformer/small": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/small/config.json",
|
||||||
|
"funnel-transformer/small-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/small-base/config.json",
|
||||||
|
"funnel-transformer/medium": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/medium/config.json",
|
||||||
|
"funnel-transformer/medium-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/medium-base/config.json",
|
||||||
|
"funnel-transformer/intermediate": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/intermediate/config.json",
|
||||||
|
"funnel-transformer/intermediate-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/intermediate-base/config.json",
|
||||||
|
"funnel-transformer/large": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/large/config.json",
|
||||||
|
"funnel-transformer/large-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/large-base/config.json",
|
||||||
|
"funnel-transformer/xlarge": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/xlarge/config.json",
|
||||||
|
"funnel-transformer/xlarge-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/xlarge-base/config.json",
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
class FunnelConfig(PretrainedConfig):
|
||||||
|
r"""
|
||||||
|
This is the configuration class to store the configuration of a :class:`~transformers.FunnelModel`.
|
||||||
|
It is used to instantiate an Funnel Transformer model according to the specified arguments, defining the model
|
||||||
|
architecture. Instantiating a configuration with the defaults will yield a similar configuration to that of
|
||||||
|
the Funnel Transformer `funnel-transformer/small <https://huggingface.co/funnel-transformer/small>`__ architecture.
|
||||||
|
|
||||||
|
Configuration objects inherit from :class:`~transformers.PretrainedConfig` and can be used
|
||||||
|
to control the model outputs. Read the documentation from :class:`~transformers.PretrainedConfig`
|
||||||
|
for more information.
|
||||||
|
|
||||||
|
|
||||||
|
Args:
|
||||||
|
vocab_size (:obj:`int`, `optional`, defaults to 30522):
|
||||||
|
Vocabulary size of the Funnel transformer. Defines the different tokens that
|
||||||
|
can be represented by the `inputs_ids` passed to the forward method of :class:`~transformers.FunnelModel`.
|
||||||
|
block_sizes (:obj:`List[int]`, `optional`, defaults to :obj:`[4, 4, 4]`):
|
||||||
|
The sizes of the blocks used in the model.
|
||||||
|
block_repeats (:obj:`List[int]`, `optional`):
|
||||||
|
If passed along, each layer of each block is repeated the number of times indicated.
|
||||||
|
num_decoder_layers (:obj:`int`, `optional`, defaults to 2):
|
||||||
|
The number of layers in the decoder (when not using the base model).
|
||||||
|
d_model (:obj:`int`, `optional`, defaults to 768):
|
||||||
|
Dimensionality of the model's hidden states.
|
||||||
|
n_head (:obj:`int`, `optional`, defaults to 12):
|
||||||
|
Number of attention heads for each attention layer in the Transformer encoder.
|
||||||
|
d_head (:obj:`int`, `optional`, defaults to 64):
|
||||||
|
Dimensionality of the model's heads.
|
||||||
|
d_inner (:obj:`int`, `optional`, defaults to 3072):
|
||||||
|
Inner dimension in the feed-forward blocks.
|
||||||
|
hidden_act (:obj:`str` or :obj:`callable`, `optional`, defaults to :obj:`"gelu_new"`):
|
||||||
|
The non-linear activation function (function or string) in the encoder and pooler.
|
||||||
|
If string, :obj:`"gelu"`, :obj:`"relu"`, :obj:`"swish"` and :obj:`"gelu_new"` are supported.
|
||||||
|
hidden_dropout (:obj:`float`, `optional`, defaults to 0.1):
|
||||||
|
The dropout probabilitiy for all fully connected layers in the embeddings, encoder, and pooler.
|
||||||
|
attention_dropout (:obj:`float`, `optional`, defaults to 0.1):
|
||||||
|
The dropout probability for the attention probabilities.
|
||||||
|
activation_dropout (:obj:`float`, `optional`, defaults to 0.0):
|
||||||
|
The dropout probability used between the two layers of the feed-forward blocks.
|
||||||
|
max_position_embeddings (:obj:`int`, `optional`, defaults to 512):
|
||||||
|
The maximum sequence length that this model might ever be used with.
|
||||||
|
Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
|
||||||
|
type_vocab_size (:obj:`int`, `optional`, defaults to 3):
|
||||||
|
The vocabulary size of the `token_type_ids` passed into :class:`~transformers.FunnelModel`.
|
||||||
|
initializer_range (:obj:`float`, `optional`, defaults to 0.1):
|
||||||
|
The standard deviation of the `uniform initializer` for initializing all weight matrices in attention
|
||||||
|
layers.
|
||||||
|
initializer_std (:obj:`float`, `optional`):
|
||||||
|
The standard deviation of the `normal initializer` for initializing the embedding matrix and the weight of
|
||||||
|
linear layers. Will default to 1 for the embedding matrix and the value given by Xavier initialization for
|
||||||
|
linear layers.
|
||||||
|
layer_norm_eps (:obj:`float`, `optional`, defaults to 1e-9):
|
||||||
|
The epsilon used by the layer normalization layers.
|
||||||
|
pooling_type (:obj:`str`, `optional`, defaults to :obj:`"mean"`):
|
||||||
|
Possible values are ``"mean"`` or ``"max"``. The way pooling is performed at the beginning of each
|
||||||
|
block.
|
||||||
|
attention_type (:obj:`str`, `optional`, defaults to :obj:`"relative_shift"`):
|
||||||
|
Possible values are ``"relative_shift"`` or ``"factorized"``. The former is faster on CPU/GPU while
|
||||||
|
the latter is faster on TPU.
|
||||||
|
separate_cls (:obj:`bool`, `optional`, defaults to :obj:`True`):
|
||||||
|
Whether or not to separate the cls token when applying pooling.
|
||||||
|
truncate_seq (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
When using ``separate_cls``, whether or not to truncate the last token when pooling, to avoid getting
|
||||||
|
a sequence length that is not a multiple of 2.
|
||||||
|
pool_q_only (:obj:`bool`, `optional`, defaults to :obj:`False`):
|
||||||
|
Whether or not to apply the pooling only to the query or to query, key and values for the attention
|
||||||
|
layers.
|
||||||
|
"""
|
||||||
|
model_type = "funnel"
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
vocab_size=30522,
|
||||||
|
block_sizes=[4, 4, 4],
|
||||||
|
block_repeats=None,
|
||||||
|
num_decoder_layers=2,
|
||||||
|
d_model=768,
|
||||||
|
n_head=12,
|
||||||
|
d_head=64,
|
||||||
|
d_inner=3072,
|
||||||
|
hidden_act="gelu_new",
|
||||||
|
hidden_dropout=0.1,
|
||||||
|
attention_dropout=0.1,
|
||||||
|
activation_dropout=0.0,
|
||||||
|
max_position_embeddings=512,
|
||||||
|
type_vocab_size=3,
|
||||||
|
initializer_range=0.1,
|
||||||
|
initializer_std=None,
|
||||||
|
layer_norm_eps=1e-9,
|
||||||
|
pooling_type="mean",
|
||||||
|
attention_type="relative_shift",
|
||||||
|
separate_cls=True,
|
||||||
|
truncate_seq=True,
|
||||||
|
pool_q_only=True,
|
||||||
|
**kwargs
|
||||||
|
):
|
||||||
|
super().__init__(**kwargs)
|
||||||
|
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.block_sizes = block_sizes
|
||||||
|
self.block_repeats = [1] * len(block_sizes) if block_repeats is None else block_repeats
|
||||||
|
assert len(block_sizes) == len(
|
||||||
|
self.block_repeats
|
||||||
|
), "`block_sizes` and `block_repeats` should have the same length."
|
||||||
|
self.num_decoder_layers = num_decoder_layers
|
||||||
|
self.d_model = d_model
|
||||||
|
self.n_head = n_head
|
||||||
|
self.d_head = d_head
|
||||||
|
self.d_inner = d_inner
|
||||||
|
self.hidden_act = hidden_act
|
||||||
|
self.hidden_dropout = hidden_dropout
|
||||||
|
self.attention_dropout = attention_dropout
|
||||||
|
self.activation_dropout = activation_dropout
|
||||||
|
self.max_position_embeddings = max_position_embeddings
|
||||||
|
self.type_vocab_size = type_vocab_size
|
||||||
|
self.initializer_range = initializer_range
|
||||||
|
self.initializer_std = initializer_std
|
||||||
|
self.layer_norm_eps = layer_norm_eps
|
||||||
|
assert pooling_type in [
|
||||||
|
"mean",
|
||||||
|
"max",
|
||||||
|
], f"Got {pooling_type} for `pooling_type` but only 'mean' and 'max' are supported."
|
||||||
|
self.pooling_type = pooling_type
|
||||||
|
assert attention_type in [
|
||||||
|
"relative_shift",
|
||||||
|
"factorized",
|
||||||
|
], f"Got {attention_type} for `attention_type` but only 'relative_shift' and 'factorized' are supported."
|
||||||
|
self.attention_type = attention_type
|
||||||
|
self.separate_cls = separate_cls
|
||||||
|
self.truncate_seq = truncate_seq
|
||||||
|
self.pool_q_only = pool_q_only
|
||||||
|
|
||||||
|
@property
|
||||||
|
def hidden_size(self):
|
||||||
|
return self.d_model
|
||||||
|
|
||||||
|
@property
|
||||||
|
def num_attention_heads(self):
|
||||||
|
return self.n_head
|
||||||
|
|
||||||
|
@property
|
||||||
|
def num_hidden_layers(self):
|
||||||
|
return sum(self.block_sizes)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def num_blocks(self):
|
||||||
|
return len(self.block_sizes)
|
||||||
61
src/transformers/convert_funnel_original_tf_checkpoint_to_pytorch.py
Executable file
61
src/transformers/convert_funnel_original_tf_checkpoint_to_pytorch.py
Executable file
@@ -0,0 +1,61 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2020 The HuggingFace Inc. team.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
"""Convert Funnel checkpoint."""
|
||||||
|
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import logging
|
||||||
|
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from transformers import FunnelConfig, FunnelForPreTraining, load_tf_weights_in_funnel
|
||||||
|
|
||||||
|
|
||||||
|
logging.basicConfig(level=logging.INFO)
|
||||||
|
|
||||||
|
|
||||||
|
def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, config_file, pytorch_dump_path):
|
||||||
|
# Initialise PyTorch model
|
||||||
|
config = FunnelConfig.from_json_file(config_file)
|
||||||
|
print("Building PyTorch model from configuration: {}".format(str(config)))
|
||||||
|
model = FunnelForPreTraining(config)
|
||||||
|
|
||||||
|
# Load weights from tf checkpoint
|
||||||
|
load_tf_weights_in_funnel(model, config, tf_checkpoint_path)
|
||||||
|
|
||||||
|
# Save pytorch-model
|
||||||
|
print("Save PyTorch model to {}".format(pytorch_dump_path))
|
||||||
|
torch.save(model.state_dict(), pytorch_dump_path)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
parser = argparse.ArgumentParser()
|
||||||
|
# Required parameters
|
||||||
|
parser.add_argument(
|
||||||
|
"--tf_checkpoint_path", default=None, type=str, required=True, help="Path to the TensorFlow checkpoint path."
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--config_file",
|
||||||
|
default=None,
|
||||||
|
type=str,
|
||||||
|
required=True,
|
||||||
|
help="The config json file corresponding to the pre-trained model. \n"
|
||||||
|
"This specifies the model architecture.",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
convert_tf_checkpoint_to_pytorch(args.tf_checkpoint_path, args.config_file, args.pytorch_dump_path)
|
||||||
@@ -29,6 +29,7 @@ from .configuration_auto import (
|
|||||||
ElectraConfig,
|
ElectraConfig,
|
||||||
EncoderDecoderConfig,
|
EncoderDecoderConfig,
|
||||||
FlaubertConfig,
|
FlaubertConfig,
|
||||||
|
FunnelConfig,
|
||||||
GPT2Config,
|
GPT2Config,
|
||||||
LongformerConfig,
|
LongformerConfig,
|
||||||
LxmertConfig,
|
LxmertConfig,
|
||||||
@@ -108,6 +109,14 @@ from .modeling_flaubert import (
|
|||||||
FlaubertModel,
|
FlaubertModel,
|
||||||
FlaubertWithLMHeadModel,
|
FlaubertWithLMHeadModel,
|
||||||
)
|
)
|
||||||
|
from .modeling_funnel import (
|
||||||
|
FunnelForMaskedLM,
|
||||||
|
FunnelForMultipleChoice,
|
||||||
|
FunnelForQuestionAnswering,
|
||||||
|
FunnelForSequenceClassification,
|
||||||
|
FunnelForTokenClassification,
|
||||||
|
FunnelModel,
|
||||||
|
)
|
||||||
from .modeling_gpt2 import GPT2LMHeadModel, GPT2Model
|
from .modeling_gpt2 import GPT2LMHeadModel, GPT2Model
|
||||||
from .modeling_longformer import (
|
from .modeling_longformer import (
|
||||||
LongformerForMaskedLM,
|
LongformerForMaskedLM,
|
||||||
@@ -202,6 +211,7 @@ MODEL_MAPPING = OrderedDict(
|
|||||||
(CTRLConfig, CTRLModel),
|
(CTRLConfig, CTRLModel),
|
||||||
(ElectraConfig, ElectraModel),
|
(ElectraConfig, ElectraModel),
|
||||||
(ReformerConfig, ReformerModel),
|
(ReformerConfig, ReformerModel),
|
||||||
|
(FunnelConfig, FunnelModel),
|
||||||
(LxmertConfig, LxmertModel),
|
(LxmertConfig, LxmertModel),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
@@ -254,6 +264,7 @@ MODEL_WITH_LM_HEAD_MAPPING = OrderedDict(
|
|||||||
(ElectraConfig, ElectraForMaskedLM),
|
(ElectraConfig, ElectraForMaskedLM),
|
||||||
(EncoderDecoderConfig, EncoderDecoderModel),
|
(EncoderDecoderConfig, EncoderDecoderModel),
|
||||||
(ReformerConfig, ReformerModelWithLMHead),
|
(ReformerConfig, ReformerModelWithLMHead),
|
||||||
|
(FunnelConfig, FunnelForMaskedLM),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -291,6 +302,7 @@ MODEL_FOR_MASKED_LM_MAPPING = OrderedDict(
|
|||||||
(XLMConfig, XLMWithLMHeadModel),
|
(XLMConfig, XLMWithLMHeadModel),
|
||||||
(ElectraConfig, ElectraForMaskedLM),
|
(ElectraConfig, ElectraForMaskedLM),
|
||||||
(ReformerConfig, ReformerForMaskedLM),
|
(ReformerConfig, ReformerForMaskedLM),
|
||||||
|
(FunnelConfig, FunnelForMaskedLM),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -320,6 +332,7 @@ MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING = OrderedDict(
|
|||||||
(FlaubertConfig, FlaubertForSequenceClassification),
|
(FlaubertConfig, FlaubertForSequenceClassification),
|
||||||
(XLMConfig, XLMForSequenceClassification),
|
(XLMConfig, XLMForSequenceClassification),
|
||||||
(ElectraConfig, ElectraForSequenceClassification),
|
(ElectraConfig, ElectraForSequenceClassification),
|
||||||
|
(FunnelConfig, FunnelForSequenceClassification),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -339,6 +352,7 @@ MODEL_FOR_QUESTION_ANSWERING_MAPPING = OrderedDict(
|
|||||||
(XLMConfig, XLMForQuestionAnsweringSimple),
|
(XLMConfig, XLMForQuestionAnsweringSimple),
|
||||||
(ElectraConfig, ElectraForQuestionAnswering),
|
(ElectraConfig, ElectraForQuestionAnswering),
|
||||||
(ReformerConfig, ReformerForQuestionAnswering),
|
(ReformerConfig, ReformerForQuestionAnswering),
|
||||||
|
(FunnelConfig, FunnelForQuestionAnswering),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -357,6 +371,7 @@ MODEL_FOR_TOKEN_CLASSIFICATION_MAPPING = OrderedDict(
|
|||||||
(AlbertConfig, AlbertForTokenClassification),
|
(AlbertConfig, AlbertForTokenClassification),
|
||||||
(ElectraConfig, ElectraForTokenClassification),
|
(ElectraConfig, ElectraForTokenClassification),
|
||||||
(FlaubertConfig, FlaubertForTokenClassification),
|
(FlaubertConfig, FlaubertForTokenClassification),
|
||||||
|
(FunnelConfig, FunnelForTokenClassification),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -374,6 +389,7 @@ MODEL_FOR_MULTIPLE_CHOICE_MAPPING = OrderedDict(
|
|||||||
(AlbertConfig, AlbertForMultipleChoice),
|
(AlbertConfig, AlbertForMultipleChoice),
|
||||||
(XLMConfig, XLMForMultipleChoice),
|
(XLMConfig, XLMForMultipleChoice),
|
||||||
(FlaubertConfig, FlaubertForMultipleChoice),
|
(FlaubertConfig, FlaubertForMultipleChoice),
|
||||||
|
(FunnelConfig, FunnelForMultipleChoice),
|
||||||
]
|
]
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -421,6 +437,7 @@ class AutoModel:
|
|||||||
- isInstance of `xlm` configuration class: :class:`~transformers.XLMModel` (XLM model)
|
- isInstance of `xlm` configuration class: :class:`~transformers.XLMModel` (XLM model)
|
||||||
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertModel` (Flaubert model)
|
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertModel` (Flaubert model)
|
||||||
- isInstance of `electra` configuration class: :class:`~transformers.ElectraModel` (Electra model)
|
- isInstance of `electra` configuration class: :class:`~transformers.ElectraModel` (Electra model)
|
||||||
|
- isInstance of `funnel` configuration class: :class:`~transformers.FunnelModel` (Funnel Transformer model)
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -462,6 +479,7 @@ class AutoModel:
|
|||||||
- `ctrl`: :class:`~transformers.CTRLModel` (Salesforce CTRL model)
|
- `ctrl`: :class:`~transformers.CTRLModel` (Salesforce CTRL model)
|
||||||
- `flaubert`: :class:`~transformers.FlaubertModel` (Flaubert model)
|
- `flaubert`: :class:`~transformers.FlaubertModel` (Flaubert model)
|
||||||
- `electra`: :class:`~transformers.ElectraModel` (Electra model)
|
- `electra`: :class:`~transformers.ElectraModel` (Electra model)
|
||||||
|
- `funnel`: :class:`~transformers.FunnelModel` (Funnel Transformer model)
|
||||||
|
|
||||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
||||||
To train the model, you should first set it back in training mode with `model.train()`
|
To train the model, you should first set it back in training mode with `model.train()`
|
||||||
@@ -729,6 +747,7 @@ class AutoModelWithLMHead:
|
|||||||
- isInstance of `xlm` configuration class: :class:`~transformers.XLMWithLMHeadModel` (XLM model)
|
- isInstance of `xlm` configuration class: :class:`~transformers.XLMWithLMHeadModel` (XLM model)
|
||||||
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertWithLMHeadModel` (Flaubert model)
|
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertWithLMHeadModel` (Flaubert model)
|
||||||
- isInstance of `electra` configuration class: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
- isInstance of `electra` configuration class: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
||||||
|
- isInstance of `funnel` configuration class: :class:`~transformers.FunnelForMaskedLM` (Funnel Transformer model)
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -774,6 +793,7 @@ class AutoModelWithLMHead:
|
|||||||
- `ctrl`: :class:`~transformers.CTRLLMHeadModel` (Salesforce CTRL model)
|
- `ctrl`: :class:`~transformers.CTRLLMHeadModel` (Salesforce CTRL model)
|
||||||
- `flaubert`: :class:`~transformers.FlaubertWithLMHeadModel` (Flaubert model)
|
- `flaubert`: :class:`~transformers.FlaubertWithLMHeadModel` (Flaubert model)
|
||||||
- `electra`: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
- `electra`: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
||||||
|
- `funnel`: :class:`~transformers.FunnelForMaskedLM` (Funnel Transformer model)
|
||||||
|
|
||||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
||||||
To train the model, you should first set it back in training mode with `model.train()`
|
To train the model, you should first set it back in training mode with `model.train()`
|
||||||
@@ -1024,6 +1044,7 @@ class AutoModelForMaskedLM:
|
|||||||
- isInstance of `electra` configuration class: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
- isInstance of `electra` configuration class: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
||||||
- isInstance of `camembert` configuration class: :class:`~transformers.CamembertForMaskedLM` (Camembert model)
|
- isInstance of `camembert` configuration class: :class:`~transformers.CamembertForMaskedLM` (Camembert model)
|
||||||
- isInstance of `albert` configuration class: :class:`~transformers.AlbertForMaskedLM` (Albert model)
|
- isInstance of `albert` configuration class: :class:`~transformers.AlbertForMaskedLM` (Albert model)
|
||||||
|
- isInstance of `funnel` configuration class: :class:`~transformers.FunnelForMaskedLM` (Funnel Transformer model)
|
||||||
|
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
@@ -1060,6 +1081,7 @@ class AutoModelForMaskedLM:
|
|||||||
- `flaubert`: :class:`~transformers.FlaubertWithLMHeadModel` (Flaubert model)
|
- `flaubert`: :class:`~transformers.FlaubertWithLMHeadModel` (Flaubert model)
|
||||||
- `electra`: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
- `electra`: :class:`~transformers.ElectraForMaskedLM` (Electra model)
|
||||||
- `bert`: :class:`~transformers.BertLMHeadModel` (Bert model)
|
- `bert`: :class:`~transformers.BertLMHeadModel` (Bert model)
|
||||||
|
- `funnel`: :class:`~transformers.FunnelForMaskedLM` (Funnel Transformer model)
|
||||||
|
|
||||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
||||||
To train the model, you should first set it back in training mode with `model.train()`
|
To train the model, you should first set it back in training mode with `model.train()`
|
||||||
@@ -1304,7 +1326,7 @@ class AutoModelForSequenceClassification:
|
|||||||
- isInstance of `xlnet` configuration class: :class:`~transformers.XLNetForSequenceClassification` (XLNet model)
|
- isInstance of `xlnet` configuration class: :class:`~transformers.XLNetForSequenceClassification` (XLNet model)
|
||||||
- isInstance of `xlm` configuration class: :class:`~transformers.XLMForSequenceClassification` (XLM model)
|
- isInstance of `xlm` configuration class: :class:`~transformers.XLMForSequenceClassification` (XLM model)
|
||||||
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertForSequenceClassification` (Flaubert model)
|
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertForSequenceClassification` (Flaubert model)
|
||||||
|
- isInstance of `funnel` configuration class: :class:`~transformers.FunnelModelForSequenceClassification` (Funnel Transformer model)
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1340,6 +1362,7 @@ class AutoModelForSequenceClassification:
|
|||||||
- `bert`: :class:`~transformers.BertForSequenceClassification` (Bert model)
|
- `bert`: :class:`~transformers.BertForSequenceClassification` (Bert model)
|
||||||
- `xlnet`: :class:`~transformers.XLNetForSequenceClassification` (XLNet model)
|
- `xlnet`: :class:`~transformers.XLNetForSequenceClassification` (XLNet model)
|
||||||
- `flaubert`: :class:`~transformers.FlaubertForSequenceClassification` (Flaubert model)
|
- `flaubert`: :class:`~transformers.FlaubertForSequenceClassification` (Flaubert model)
|
||||||
|
- `funnel`: :class:`~transformers.FunnelForSequenceClassification` (Funnel Transformer model)
|
||||||
|
|
||||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
||||||
To train the model, you should first set it back in training mode with `model.train()`
|
To train the model, you should first set it back in training mode with `model.train()`
|
||||||
@@ -1454,6 +1477,7 @@ class AutoModelForQuestionAnswering:
|
|||||||
- isInstance of `xlnet` configuration class: :class:`~transformers.XLNetForQuestionAnswering` (XLNet model)
|
- isInstance of `xlnet` configuration class: :class:`~transformers.XLNetForQuestionAnswering` (XLNet model)
|
||||||
- isInstance of `xlm` configuration class: :class:`~transformers.XLMForQuestionAnswering` (XLM model)
|
- isInstance of `xlm` configuration class: :class:`~transformers.XLMForQuestionAnswering` (XLM model)
|
||||||
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertForQuestionAnswering` (XLM model)
|
- isInstance of `flaubert` configuration class: :class:`~transformers.FlaubertForQuestionAnswering` (XLM model)
|
||||||
|
- isInstance of `funnel` configuration class: :class:`~transformers.FunnelForQuestionAnswering` (Funnel Transformer model)
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1488,6 +1512,7 @@ class AutoModelForQuestionAnswering:
|
|||||||
- `xlnet`: :class:`~transformers.XLNetForQuestionAnswering` (XLNet model)
|
- `xlnet`: :class:`~transformers.XLNetForQuestionAnswering` (XLNet model)
|
||||||
- `xlm`: :class:`~transformers.XLMForQuestionAnswering` (XLM model)
|
- `xlm`: :class:`~transformers.XLMForQuestionAnswering` (XLM model)
|
||||||
- `flaubert`: :class:`~transformers.FlaubertForQuestionAnswering` (XLM model)
|
- `flaubert`: :class:`~transformers.FlaubertForQuestionAnswering` (XLM model)
|
||||||
|
- `funnel`: :class:`~transformers.FunnelForQuestionAnswering` (Funnel Transformer model)
|
||||||
|
|
||||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
||||||
To train the model, you should first set it back in training mode with `model.train()`
|
To train the model, you should first set it back in training mode with `model.train()`
|
||||||
@@ -1604,6 +1629,7 @@ class AutoModelForTokenClassification:
|
|||||||
- isInstance of `camembert` configuration class: :class:`~transformers.CamembertModelForTokenClassification` (Camembert model)
|
- isInstance of `camembert` configuration class: :class:`~transformers.CamembertModelForTokenClassification` (Camembert model)
|
||||||
- isInstance of `roberta` configuration class: :class:`~transformers.RobertaModelForTokenClassification` (Roberta model)
|
- isInstance of `roberta` configuration class: :class:`~transformers.RobertaModelForTokenClassification` (Roberta model)
|
||||||
- isInstance of `electra` configuration class: :class:`~transformers.ElectraForTokenClassification` (Electra model)
|
- isInstance of `electra` configuration class: :class:`~transformers.ElectraForTokenClassification` (Electra model)
|
||||||
|
- isInstance of `funnel` configuration class: :class:`~transformers.FunnelForTokenClassification` (Funnel Transformer model)
|
||||||
|
|
||||||
Examples::
|
Examples::
|
||||||
|
|
||||||
@@ -1641,6 +1667,7 @@ class AutoModelForTokenClassification:
|
|||||||
- `flaubert`: :class:`~transformers.FlaubertForTokenClassification` (Flaubert model)
|
- `flaubert`: :class:`~transformers.FlaubertForTokenClassification` (Flaubert model)
|
||||||
- `roberta`: :class:`~transformers.RobertaForTokenClassification` (Roberta model)
|
- `roberta`: :class:`~transformers.RobertaForTokenClassification` (Roberta model)
|
||||||
- `electra`: :class:`~transformers.ElectraForTokenClassification` (Electra model)
|
- `electra`: :class:`~transformers.ElectraForTokenClassification` (Electra model)
|
||||||
|
- `funnel`: :class:`~transformers.FunnelForTokenClassification` (Funnel Transformer model)
|
||||||
|
|
||||||
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
The model is set in evaluation mode by default using `model.eval()` (Dropout modules are deactivated)
|
||||||
To train the model, you should first set it back in training mode with `model.train()`
|
To train the model, you should first set it back in training mode with `model.train()`
|
||||||
|
|||||||
1544
src/transformers/modeling_funnel.py
Normal file
1544
src/transformers/modeling_funnel.py
Normal file
File diff suppressed because it is too large
Load Diff
@@ -27,6 +27,7 @@ from .configuration_auto import (
|
|||||||
DistilBertConfig,
|
DistilBertConfig,
|
||||||
ElectraConfig,
|
ElectraConfig,
|
||||||
FlaubertConfig,
|
FlaubertConfig,
|
||||||
|
FunnelConfig,
|
||||||
GPT2Config,
|
GPT2Config,
|
||||||
LongformerConfig,
|
LongformerConfig,
|
||||||
LxmertConfig,
|
LxmertConfig,
|
||||||
@@ -54,6 +55,7 @@ from .tokenization_ctrl import CTRLTokenizer
|
|||||||
from .tokenization_distilbert import DistilBertTokenizer, DistilBertTokenizerFast
|
from .tokenization_distilbert import DistilBertTokenizer, DistilBertTokenizerFast
|
||||||
from .tokenization_electra import ElectraTokenizer, ElectraTokenizerFast
|
from .tokenization_electra import ElectraTokenizer, ElectraTokenizerFast
|
||||||
from .tokenization_flaubert import FlaubertTokenizer
|
from .tokenization_flaubert import FlaubertTokenizer
|
||||||
|
from .tokenization_funnel import FunnelTokenizer, FunnelTokenizerFast
|
||||||
from .tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
|
from .tokenization_gpt2 import GPT2Tokenizer, GPT2TokenizerFast
|
||||||
from .tokenization_longformer import LongformerTokenizer, LongformerTokenizerFast
|
from .tokenization_longformer import LongformerTokenizer, LongformerTokenizerFast
|
||||||
from .tokenization_lxmert import LxmertTokenizer, LxmertTokenizerFast
|
from .tokenization_lxmert import LxmertTokenizer, LxmertTokenizerFast
|
||||||
@@ -93,6 +95,7 @@ TOKENIZER_MAPPING = OrderedDict(
|
|||||||
(RobertaConfig, (RobertaTokenizer, RobertaTokenizerFast)),
|
(RobertaConfig, (RobertaTokenizer, RobertaTokenizerFast)),
|
||||||
(ReformerConfig, (ReformerTokenizer, None)),
|
(ReformerConfig, (ReformerTokenizer, None)),
|
||||||
(ElectraConfig, (ElectraTokenizer, ElectraTokenizerFast)),
|
(ElectraConfig, (ElectraTokenizer, ElectraTokenizerFast)),
|
||||||
|
(FunnelConfig, (FunnelTokenizer, FunnelTokenizerFast)),
|
||||||
(LxmertConfig, (LxmertTokenizer, LxmertTokenizerFast)),
|
(LxmertConfig, (LxmertTokenizer, LxmertTokenizerFast)),
|
||||||
(BertConfig, (BertTokenizer, BertTokenizerFast)),
|
(BertConfig, (BertTokenizer, BertTokenizerFast)),
|
||||||
(OpenAIGPTConfig, (OpenAIGPTTokenizer, OpenAIGPTTokenizerFast)),
|
(OpenAIGPTConfig, (OpenAIGPTTokenizer, OpenAIGPTTokenizerFast)),
|
||||||
@@ -131,6 +134,7 @@ class AutoTokenizer:
|
|||||||
- `xlm`: XLMTokenizer (XLM model)
|
- `xlm`: XLMTokenizer (XLM model)
|
||||||
- `ctrl`: CTRLTokenizer (Salesforce CTRL model)
|
- `ctrl`: CTRLTokenizer (Salesforce CTRL model)
|
||||||
- `electra`: ElectraTokenizer (Google ELECTRA model)
|
- `electra`: ElectraTokenizer (Google ELECTRA model)
|
||||||
|
- `funnel`: FunnelTokenizer (Funnel Transformer model)
|
||||||
- `lxmert`: LxmertTokenizer (Lxmert model)
|
- `lxmert`: LxmertTokenizer (Lxmert model)
|
||||||
|
|
||||||
This class cannot be instantiated using `__init__()` (throw an error).
|
This class cannot be instantiated using `__init__()` (throw an error).
|
||||||
@@ -167,6 +171,7 @@ class AutoTokenizer:
|
|||||||
- `xlm`: XLMTokenizer (XLM model)
|
- `xlm`: XLMTokenizer (XLM model)
|
||||||
- `ctrl`: CTRLTokenizer (Salesforce CTRL model)
|
- `ctrl`: CTRLTokenizer (Salesforce CTRL model)
|
||||||
- `electra`: ElectraTokenizer (Google ELECTRA model)
|
- `electra`: ElectraTokenizer (Google ELECTRA model)
|
||||||
|
- `funnel`: FunnelTokenizer (Funnel Transformer model)
|
||||||
- `lxmert`: LxmertTokenizer (Lxmert model)
|
- `lxmert`: LxmertTokenizer (Lxmert model)
|
||||||
|
|
||||||
Params:
|
Params:
|
||||||
|
|||||||
232
src/transformers/tokenization_funnel.py
Normal file
232
src/transformers/tokenization_funnel.py
Normal file
@@ -0,0 +1,232 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2020 The HuggingFace Inc. team.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
""" Tokenization class for Funnel Transformer."""
|
||||||
|
|
||||||
|
from typing import List, Optional
|
||||||
|
|
||||||
|
from .tokenization_bert import BertTokenizer, BertTokenizerFast
|
||||||
|
from .utils import logging
|
||||||
|
|
||||||
|
|
||||||
|
logger = logging.get_logger(__name__)
|
||||||
|
|
||||||
|
VOCAB_FILES_NAMES = {"vocab_file": "vocab.txt"}
|
||||||
|
|
||||||
|
_model_names = [
|
||||||
|
"small",
|
||||||
|
"small-base",
|
||||||
|
"medium",
|
||||||
|
"medium-base",
|
||||||
|
"intermediate",
|
||||||
|
"intermediate-base",
|
||||||
|
"large",
|
||||||
|
"large-base",
|
||||||
|
"xlarge",
|
||||||
|
"xlarge-base",
|
||||||
|
]
|
||||||
|
|
||||||
|
PRETRAINED_VOCAB_FILES_MAP = {
|
||||||
|
"vocab_file": {
|
||||||
|
"funnel-transformer/small": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/small/vocab.txt",
|
||||||
|
"funnel-transformer/small-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/small-base/vocab.txt",
|
||||||
|
"funnel-transformer/medium": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/medium/vocab.txt",
|
||||||
|
"funnel-transformer/medium-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/medium-base/vocab.txt",
|
||||||
|
"funnel-transformer/intermediate": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/intermediate/vocab.txt",
|
||||||
|
"funnel-transformer/intermediate-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/intermediate-base/vocab.txt",
|
||||||
|
"funnel-transformer/large": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/large/vocab.txt",
|
||||||
|
"funnel-transformer/large-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/large-base/vocab.txt",
|
||||||
|
"funnel-transformer/xlarge": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/xlarge/vocab.txt",
|
||||||
|
"funnel-transformer/xlarge-base": "https://s3.amazonaws.com/models.huggingface.co/bert/funnel-transformer/xlarge-base/vocab.txt",
|
||||||
|
}
|
||||||
|
}
|
||||||
|
PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {f"funnel-transformer/{name}": 512 for name in _model_names}
|
||||||
|
PRETRAINED_INIT_CONFIGURATION = {f"funnel-transformer/{name}": {"do_lower_case": True} for name in _model_names}
|
||||||
|
|
||||||
|
|
||||||
|
class FunnelTokenizer(BertTokenizer):
|
||||||
|
r"""
|
||||||
|
Tokenizer for the Funnel Transformer models.
|
||||||
|
|
||||||
|
:class:`~transformers.FunnelTokenizer` is identical to :class:`~transformers.BertTokenizer` and runs end-to-end
|
||||||
|
tokenization: punctuation splitting + wordpiece.
|
||||||
|
|
||||||
|
Refer to superclass :class:`~transformers.BertTokenizer` for usage examples and documentation concerning
|
||||||
|
parameters.
|
||||||
|
"""
|
||||||
|
|
||||||
|
vocab_files_names = VOCAB_FILES_NAMES
|
||||||
|
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
|
||||||
|
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
|
||||||
|
pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
|
||||||
|
cls_token_type_id: int = 2
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
vocab_file,
|
||||||
|
do_lower_case=True,
|
||||||
|
do_basic_tokenize=True,
|
||||||
|
never_split=None,
|
||||||
|
unk_token="<unk>",
|
||||||
|
sep_token="<sep>",
|
||||||
|
pad_token="<pad>",
|
||||||
|
cls_token="<cls>",
|
||||||
|
mask_token="<mask>",
|
||||||
|
bos_token="<s>",
|
||||||
|
eos_token="</s>",
|
||||||
|
tokenize_chinese_chars=True,
|
||||||
|
strip_accents=None,
|
||||||
|
**kwargs
|
||||||
|
):
|
||||||
|
super().__init__(
|
||||||
|
vocab_file,
|
||||||
|
do_lower_case=do_lower_case,
|
||||||
|
do_basic_tokenize=do_basic_tokenize,
|
||||||
|
never_split=never_split,
|
||||||
|
unk_token=unk_token,
|
||||||
|
sep_token=sep_token,
|
||||||
|
pad_token=pad_token,
|
||||||
|
cls_token=cls_token,
|
||||||
|
mask_token=mask_token,
|
||||||
|
bos_token=bos_token,
|
||||||
|
eos_token=eos_token,
|
||||||
|
tokenize_chinese_chars=tokenize_chinese_chars,
|
||||||
|
strip_accents=strip_accents,
|
||||||
|
**kwargs,
|
||||||
|
)
|
||||||
|
|
||||||
|
def create_token_type_ids_from_sequences(
|
||||||
|
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
|
||||||
|
) -> List[int]:
|
||||||
|
"""
|
||||||
|
Creates a mask from the two sequences passed to be used in a sequence-pair classification task.
|
||||||
|
Funnel Transformer expects a sequence pair mask that has the following format:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
|
||||||
|
| first sequence | second sequence |
|
||||||
|
|
||||||
|
if token_ids_1 is None, only returns the first portion of the mask (0's).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
token_ids_0 (:obj:`List[int]`):
|
||||||
|
List of ids.
|
||||||
|
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||||
|
Optional second list of IDs for sequence pairs.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
:obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
|
||||||
|
sequence(s).
|
||||||
|
"""
|
||||||
|
sep = [self.sep_token_id]
|
||||||
|
cls = [self.cls_token_id]
|
||||||
|
if token_ids_1 is None:
|
||||||
|
return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0]
|
||||||
|
return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]
|
||||||
|
|
||||||
|
|
||||||
|
class FunnelTokenizerFast(BertTokenizerFast):
|
||||||
|
r"""
|
||||||
|
"Fast" tokenizer for the Funnel Transformer models (backed by HuggingFace's :obj:`tokenizers` library).
|
||||||
|
|
||||||
|
:class:`~transformers.FunnelTokenizerFast` is identical to :class:`~transformers.BertTokenizerFast` and runs
|
||||||
|
end-to-end tokenization: punctuation splitting + wordpiece.
|
||||||
|
|
||||||
|
Refer to superclass :class:`~transformers.BertTokenizerFast` for usage examples and documentation concerning
|
||||||
|
parameters.
|
||||||
|
"""
|
||||||
|
|
||||||
|
vocab_files_names = VOCAB_FILES_NAMES
|
||||||
|
pretrained_vocab_files_map = PRETRAINED_VOCAB_FILES_MAP
|
||||||
|
max_model_input_sizes = PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES
|
||||||
|
pretrained_init_configuration = PRETRAINED_INIT_CONFIGURATION
|
||||||
|
cls_token_type_id: int = 2
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
vocab_file,
|
||||||
|
do_lower_case=True,
|
||||||
|
unk_token="<unk>",
|
||||||
|
sep_token="<sep>",
|
||||||
|
pad_token="<pad>",
|
||||||
|
cls_token="<cls>",
|
||||||
|
mask_token="<mask>",
|
||||||
|
bos_token="<s>",
|
||||||
|
eos_token="</s>",
|
||||||
|
clean_text=True,
|
||||||
|
tokenize_chinese_chars=True,
|
||||||
|
strip_accents=None,
|
||||||
|
wordpieces_prefix="##",
|
||||||
|
**kwargs
|
||||||
|
):
|
||||||
|
super().__init__(
|
||||||
|
vocab_file,
|
||||||
|
do_lower_case=do_lower_case,
|
||||||
|
unk_token=unk_token,
|
||||||
|
sep_token=sep_token,
|
||||||
|
pad_token=pad_token,
|
||||||
|
cls_token=cls_token,
|
||||||
|
mask_token=mask_token,
|
||||||
|
bos_token=bos_token,
|
||||||
|
eos_token=eos_token,
|
||||||
|
clean_text=clean_text,
|
||||||
|
tokenize_chinese_chars=tokenize_chinese_chars,
|
||||||
|
strip_accents=strip_accents,
|
||||||
|
wordpieces_prefix=wordpieces_prefix,
|
||||||
|
**kwargs,
|
||||||
|
)
|
||||||
|
|
||||||
|
def create_token_type_ids_from_sequences(
|
||||||
|
self, token_ids_0: List[int], token_ids_1: Optional[List[int]] = None
|
||||||
|
) -> List[int]:
|
||||||
|
"""
|
||||||
|
Creates a mask from the two sequences passed to be used in a sequence-pair classification task.
|
||||||
|
Funnel Transformer expects a sequence pair mask that has the following format:
|
||||||
|
|
||||||
|
::
|
||||||
|
|
||||||
|
2 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
|
||||||
|
| first sequence | second sequence |
|
||||||
|
|
||||||
|
if token_ids_1 is None, only returns the first portion of the mask (0's).
|
||||||
|
|
||||||
|
Args:
|
||||||
|
token_ids_0 (:obj:`List[int]`):
|
||||||
|
List of ids.
|
||||||
|
token_ids_1 (:obj:`List[int]`, `optional`):
|
||||||
|
Optional second list of IDs for sequence pairs.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
:obj:`List[int]`: List of `token type IDs <../glossary.html#token-type-ids>`_ according to the given
|
||||||
|
sequence(s).
|
||||||
|
"""
|
||||||
|
sep = [self.sep_token_id]
|
||||||
|
cls = [self.cls_token_id]
|
||||||
|
if token_ids_1 is None:
|
||||||
|
return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0]
|
||||||
|
return len(cls) * [self.cls_token_type_id] + len(token_ids_0 + sep) * [0] + len(token_ids_1 + sep) * [1]
|
||||||
|
|
||||||
|
def _convert_encoding(self, encoding, **kwargs):
|
||||||
|
# The fast tokenizer doesn't use the function above so we fix the cls token type id when decoding the fast
|
||||||
|
# tokenzier output.
|
||||||
|
encoding_dict = super()._convert_encoding(encoding, **kwargs)
|
||||||
|
if "token_type_ids" in encoding_dict:
|
||||||
|
# Note: we can't assume the <cls> token is in first position because left padding is a thing, hence the
|
||||||
|
# double list comprehension.
|
||||||
|
encoding_dict["token_type_ids"] = [
|
||||||
|
[self.cls_token_type_id if i == self.cls_token_id else t for i, t in zip(input_ids, type_ids)]
|
||||||
|
for input_ids, type_ids in zip(encoding_dict["input_ids"], encoding_dict["token_type_ids"])
|
||||||
|
]
|
||||||
|
return encoding_dict
|
||||||
@@ -539,7 +539,10 @@ class ModelTesterMixin:
|
|||||||
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
|
outputs = model(**self._prepare_for_class(inputs_dict, model_class))
|
||||||
hidden_states = outputs[-1]
|
hidden_states = outputs[-1]
|
||||||
|
|
||||||
self.assertEqual(len(hidden_states), self.model_tester.num_hidden_layers + 1)
|
expected_num_layers = getattr(
|
||||||
|
self.model_tester, "expected_num_hidden_layers", self.model_tester.num_hidden_layers + 1
|
||||||
|
)
|
||||||
|
self.assertEqual(len(hidden_states), expected_num_layers)
|
||||||
if hasattr(self.model_tester, "encoder_seq_length"):
|
if hasattr(self.model_tester, "encoder_seq_length"):
|
||||||
seq_length = self.model_tester.encoder_seq_length
|
seq_length = self.model_tester.encoder_seq_length
|
||||||
if hasattr(self.model_tester, "chunk_length") and self.model_tester.chunk_length > 1:
|
if hasattr(self.model_tester, "chunk_length") and self.model_tester.chunk_length > 1:
|
||||||
|
|||||||
454
tests/test_modeling_funnel.py
Normal file
454
tests/test_modeling_funnel.py
Normal file
@@ -0,0 +1,454 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2020 HuggingFace Inc. team.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
from transformers import FunnelTokenizer, is_torch_available
|
||||||
|
from transformers.testing_utils import require_torch, slow, torch_device
|
||||||
|
|
||||||
|
from .test_configuration_common import ConfigTester
|
||||||
|
from .test_modeling_common import ModelTesterMixin, ids_tensor
|
||||||
|
|
||||||
|
|
||||||
|
if is_torch_available():
|
||||||
|
import torch
|
||||||
|
|
||||||
|
from transformers import (
|
||||||
|
FunnelBaseModel,
|
||||||
|
FunnelConfig,
|
||||||
|
FunnelForMaskedLM,
|
||||||
|
FunnelForMultipleChoice,
|
||||||
|
FunnelForPreTraining,
|
||||||
|
FunnelForQuestionAnswering,
|
||||||
|
FunnelForSequenceClassification,
|
||||||
|
FunnelForTokenClassification,
|
||||||
|
FunnelModel,
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
class FunnelModelTester:
|
||||||
|
"""You can also import this e.g, from .test_modeling_funnel import FunnelModelTester """
|
||||||
|
|
||||||
|
def __init__(
|
||||||
|
self,
|
||||||
|
parent,
|
||||||
|
batch_size=13,
|
||||||
|
seq_length=7,
|
||||||
|
is_training=True,
|
||||||
|
use_input_mask=True,
|
||||||
|
use_token_type_ids=True,
|
||||||
|
use_labels=True,
|
||||||
|
vocab_size=99,
|
||||||
|
block_sizes=[1, 1, 2],
|
||||||
|
num_decoder_layers=1,
|
||||||
|
d_model=32,
|
||||||
|
n_head=4,
|
||||||
|
d_head=8,
|
||||||
|
d_inner=37,
|
||||||
|
hidden_act="gelu_new",
|
||||||
|
hidden_dropout=0.1,
|
||||||
|
attention_dropout=0.1,
|
||||||
|
activation_dropout=0.0,
|
||||||
|
max_position_embeddings=512,
|
||||||
|
type_vocab_size=3,
|
||||||
|
num_labels=3,
|
||||||
|
num_choices=4,
|
||||||
|
scope=None,
|
||||||
|
base=False,
|
||||||
|
):
|
||||||
|
self.parent = parent
|
||||||
|
self.batch_size = batch_size
|
||||||
|
self.seq_length = seq_length
|
||||||
|
self.is_training = is_training
|
||||||
|
self.use_input_mask = use_input_mask
|
||||||
|
self.use_token_type_ids = use_token_type_ids
|
||||||
|
self.use_labels = use_labels
|
||||||
|
self.vocab_size = vocab_size
|
||||||
|
self.block_sizes = block_sizes
|
||||||
|
self.num_decoder_layers = num_decoder_layers
|
||||||
|
self.d_model = d_model
|
||||||
|
self.n_head = n_head
|
||||||
|
self.d_head = d_head
|
||||||
|
self.d_inner = d_inner
|
||||||
|
self.hidden_act = hidden_act
|
||||||
|
self.hidden_dropout = hidden_dropout
|
||||||
|
self.attention_dropout = attention_dropout
|
||||||
|
self.activation_dropout = activation_dropout
|
||||||
|
self.max_position_embeddings = max_position_embeddings
|
||||||
|
self.type_vocab_size = type_vocab_size
|
||||||
|
self.type_sequence_label_size = 2
|
||||||
|
self.num_labels = num_labels
|
||||||
|
self.num_choices = num_choices
|
||||||
|
self.scope = scope
|
||||||
|
|
||||||
|
# Used in the tests to check the size of the first attention layer
|
||||||
|
self.num_attention_heads = n_head
|
||||||
|
# Used in the tests to check the size of the first hidden state
|
||||||
|
self.hidden_size = self.d_model
|
||||||
|
# Used in the tests to check the number of output hidden states/attentions
|
||||||
|
self.num_hidden_layers = sum(self.block_sizes) + (0 if base else self.num_decoder_layers)
|
||||||
|
# FunnelModel adds two hidden layers: input embeddings and the sum of the upsampled encoder hidden state with
|
||||||
|
# the last hidden state of the first block (which is the first hidden state of the decoder).
|
||||||
|
if not base:
|
||||||
|
self.expected_num_hidden_layers = self.num_hidden_layers + 2
|
||||||
|
|
||||||
|
def prepare_config_and_inputs(self):
|
||||||
|
input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size)
|
||||||
|
|
||||||
|
input_mask = None
|
||||||
|
if self.use_input_mask:
|
||||||
|
input_mask = ids_tensor([self.batch_size, self.seq_length], vocab_size=2)
|
||||||
|
|
||||||
|
token_type_ids = None
|
||||||
|
if self.use_token_type_ids:
|
||||||
|
token_type_ids = ids_tensor([self.batch_size, self.seq_length], self.type_vocab_size)
|
||||||
|
|
||||||
|
sequence_labels = None
|
||||||
|
token_labels = None
|
||||||
|
choice_labels = None
|
||||||
|
if self.use_labels:
|
||||||
|
sequence_labels = ids_tensor([self.batch_size], self.type_sequence_label_size)
|
||||||
|
token_labels = ids_tensor([self.batch_size, self.seq_length], self.num_labels)
|
||||||
|
choice_labels = ids_tensor([self.batch_size], self.num_choices)
|
||||||
|
fake_token_labels = ids_tensor([self.batch_size, self.seq_length], 1)
|
||||||
|
|
||||||
|
config = FunnelConfig(
|
||||||
|
vocab_size=self.vocab_size,
|
||||||
|
block_sizes=self.block_sizes,
|
||||||
|
num_decoder_layers=self.num_decoder_layers,
|
||||||
|
d_model=self.d_model,
|
||||||
|
n_head=self.n_head,
|
||||||
|
d_head=self.d_head,
|
||||||
|
d_inner=self.d_inner,
|
||||||
|
hidden_act=self.hidden_act,
|
||||||
|
hidden_dropout=self.hidden_dropout,
|
||||||
|
attention_dropout=self.attention_dropout,
|
||||||
|
activation_dropout=self.activation_dropout,
|
||||||
|
max_position_embeddings=self.max_position_embeddings,
|
||||||
|
type_vocab_size=self.type_vocab_size,
|
||||||
|
return_dict=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
return (
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
)
|
||||||
|
|
||||||
|
def create_and_check_model(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
model = FunnelModel(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids)
|
||||||
|
result = model(input_ids, token_type_ids=token_type_ids)
|
||||||
|
result = model(input_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.d_model))
|
||||||
|
|
||||||
|
model.config.truncate_seq = False
|
||||||
|
result = model(input_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.d_model))
|
||||||
|
|
||||||
|
model.config.separate_cls = False
|
||||||
|
result = model(input_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, self.seq_length, self.d_model))
|
||||||
|
|
||||||
|
def create_and_check_base_model(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
model = FunnelBaseModel(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids)
|
||||||
|
result = model(input_ids, token_type_ids=token_type_ids)
|
||||||
|
result = model(input_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, 2, self.d_model))
|
||||||
|
|
||||||
|
model.config.truncate_seq = False
|
||||||
|
result = model(input_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, 3, self.d_model))
|
||||||
|
|
||||||
|
model.config.separate_cls = False
|
||||||
|
result = model(input_ids)
|
||||||
|
self.parent.assertEqual(result.last_hidden_state.shape, (self.batch_size, 2, self.d_model))
|
||||||
|
|
||||||
|
def create_and_check_for_pretraining(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
config.num_labels = self.num_labels
|
||||||
|
model = FunnelForPreTraining(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=fake_token_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length))
|
||||||
|
|
||||||
|
def create_and_check_for_masked_lm(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
model = FunnelForMaskedLM(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=token_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.vocab_size))
|
||||||
|
|
||||||
|
def create_and_check_for_sequence_classification(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
config.num_labels = self.num_labels
|
||||||
|
model = FunnelForSequenceClassification(config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=sequence_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.num_labels))
|
||||||
|
|
||||||
|
def create_and_check_for_multiple_choice(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
config.num_choices = self.num_choices
|
||||||
|
model = FunnelForMultipleChoice(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
multiple_choice_inputs_ids = input_ids.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous()
|
||||||
|
multiple_choice_token_type_ids = token_type_ids.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous()
|
||||||
|
multiple_choice_input_mask = input_mask.unsqueeze(1).expand(-1, self.num_choices, -1).contiguous()
|
||||||
|
result = model(
|
||||||
|
multiple_choice_inputs_ids,
|
||||||
|
attention_mask=multiple_choice_input_mask,
|
||||||
|
token_type_ids=multiple_choice_token_type_ids,
|
||||||
|
labels=choice_labels,
|
||||||
|
)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.num_choices))
|
||||||
|
|
||||||
|
def create_and_check_for_token_classification(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
config.num_labels = self.num_labels
|
||||||
|
model = FunnelForTokenClassification(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(input_ids, attention_mask=input_mask, token_type_ids=token_type_ids, labels=token_labels)
|
||||||
|
self.parent.assertEqual(result.logits.shape, (self.batch_size, self.seq_length, self.num_labels))
|
||||||
|
|
||||||
|
def create_and_check_for_question_answering(
|
||||||
|
self,
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
):
|
||||||
|
model = FunnelForQuestionAnswering(config=config)
|
||||||
|
model.to(torch_device)
|
||||||
|
model.eval()
|
||||||
|
result = model(
|
||||||
|
input_ids,
|
||||||
|
attention_mask=input_mask,
|
||||||
|
token_type_ids=token_type_ids,
|
||||||
|
start_positions=sequence_labels,
|
||||||
|
end_positions=sequence_labels,
|
||||||
|
)
|
||||||
|
self.parent.assertEqual(result.start_logits.shape, (self.batch_size, self.seq_length))
|
||||||
|
self.parent.assertEqual(result.end_logits.shape, (self.batch_size, self.seq_length))
|
||||||
|
|
||||||
|
def prepare_config_and_inputs_for_common(self):
|
||||||
|
config_and_inputs = self.prepare_config_and_inputs()
|
||||||
|
(
|
||||||
|
config,
|
||||||
|
input_ids,
|
||||||
|
token_type_ids,
|
||||||
|
input_mask,
|
||||||
|
sequence_labels,
|
||||||
|
token_labels,
|
||||||
|
choice_labels,
|
||||||
|
fake_token_labels,
|
||||||
|
) = config_and_inputs
|
||||||
|
inputs_dict = {"input_ids": input_ids, "token_type_ids": token_type_ids, "attention_mask": input_mask}
|
||||||
|
return config, inputs_dict
|
||||||
|
|
||||||
|
|
||||||
|
@require_torch
|
||||||
|
class FunnelModelTest(ModelTesterMixin, unittest.TestCase):
|
||||||
|
test_head_masking = False
|
||||||
|
test_pruning = False
|
||||||
|
all_model_classes = (
|
||||||
|
(
|
||||||
|
FunnelModel,
|
||||||
|
FunnelForMaskedLM,
|
||||||
|
FunnelForPreTraining,
|
||||||
|
FunnelForQuestionAnswering,
|
||||||
|
FunnelForTokenClassification,
|
||||||
|
)
|
||||||
|
if is_torch_available()
|
||||||
|
else ()
|
||||||
|
)
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
self.model_tester = FunnelModelTester(self)
|
||||||
|
self.config_tester = ConfigTester(self, config_class=FunnelConfig)
|
||||||
|
|
||||||
|
def test_config(self):
|
||||||
|
self.config_tester.run_common_tests()
|
||||||
|
|
||||||
|
def test_model(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_model(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_pretraining(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_pretraining(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_masked_lm(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_masked_lm(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_token_classification(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_token_classification(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_question_answering(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_question_answering(*config_and_inputs)
|
||||||
|
|
||||||
|
|
||||||
|
@require_torch
|
||||||
|
class FunnelBaseModelTest(ModelTesterMixin, unittest.TestCase):
|
||||||
|
test_head_masking = False
|
||||||
|
test_pruning = False
|
||||||
|
all_model_classes = (
|
||||||
|
(FunnelBaseModel, FunnelForMultipleChoice, FunnelForSequenceClassification) if is_torch_available() else ()
|
||||||
|
)
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
self.model_tester = FunnelModelTester(self, base=True)
|
||||||
|
self.config_tester = ConfigTester(self, config_class=FunnelConfig)
|
||||||
|
|
||||||
|
def test_config(self):
|
||||||
|
self.config_tester.run_common_tests()
|
||||||
|
|
||||||
|
def test_base_model(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_base_model(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_sequence_classification(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_sequence_classification(*config_and_inputs)
|
||||||
|
|
||||||
|
def test_for_multiple_choice(self):
|
||||||
|
config_and_inputs = self.model_tester.prepare_config_and_inputs()
|
||||||
|
self.model_tester.create_and_check_for_multiple_choice(*config_and_inputs)
|
||||||
|
|
||||||
|
|
||||||
|
@require_torch
|
||||||
|
class FunnelModelIntegrationTest(unittest.TestCase):
|
||||||
|
def test_inference_tiny_model(self):
|
||||||
|
batch_size = 13
|
||||||
|
sequence_length = 7
|
||||||
|
input_ids = torch.arange(0, batch_size * sequence_length).long().reshape(batch_size, sequence_length)
|
||||||
|
lengths = [0, 1, 2, 3, 4, 5, 6, 4, 1, 3, 5, 0, 1]
|
||||||
|
token_type_ids = torch.tensor([[2] + [0] * a + [1] * (sequence_length - a - 1) for a in lengths])
|
||||||
|
|
||||||
|
model = FunnelModel.from_pretrained("sgugger/funnel-random-tiny")
|
||||||
|
output = model(input_ids, token_type_ids=token_type_ids)[0].abs()
|
||||||
|
|
||||||
|
expected_output_sum = torch.tensor(2344.9023)
|
||||||
|
expected_output_mean = torch.tensor(0.8053)
|
||||||
|
self.assertTrue(torch.allclose(output.sum(), expected_output_sum, atol=1e-4))
|
||||||
|
self.assertTrue(torch.allclose(output.mean(), expected_output_mean, atol=1e-4))
|
||||||
|
|
||||||
|
attention_mask = torch.tensor([[1] * 7, [1] * 4 + [0] * 3] * 6 + [[0, 1, 1, 0, 0, 1, 1]])
|
||||||
|
output = model(input_ids, attention_mask=attention_mask, token_type_ids=token_type_ids)[0].abs()
|
||||||
|
|
||||||
|
expected_output_sum = torch.tensor(2363.2178)
|
||||||
|
expected_output_mean = torch.tensor(0.8115)
|
||||||
|
self.assertTrue(torch.allclose(output.sum(), expected_output_sum, atol=1e-4))
|
||||||
|
self.assertTrue(torch.allclose(output.mean(), expected_output_mean, atol=1e-4))
|
||||||
|
|
||||||
|
@slow
|
||||||
|
def test_inference_model(self):
|
||||||
|
tokenizer = FunnelTokenizer.from_pretrained("huggingface/funnel-small")
|
||||||
|
model = FunnelModel.from_pretrained("huggingface/funnel-small")
|
||||||
|
inputs = tokenizer("Hello! I am the Funnel Transformer model.", return_tensors="pt")
|
||||||
|
output = model(**inputs)[0]
|
||||||
|
|
||||||
|
expected_output_sum = torch.tensor(235.7827)
|
||||||
|
expected_output_mean = torch.tensor(0.0256)
|
||||||
|
self.assertTrue(torch.allclose(output.sum(), expected_output_sum, atol=1e-4))
|
||||||
|
self.assertTrue(torch.allclose(output.mean(), expected_output_mean, atol=1e-4))
|
||||||
78
tests/test_tokenization_funnel.py
Normal file
78
tests/test_tokenization_funnel.py
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
# coding=utf-8
|
||||||
|
# Copyright 2020 HuggingFace Inc. team.
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import os
|
||||||
|
import unittest
|
||||||
|
|
||||||
|
from transformers.tokenization_funnel import VOCAB_FILES_NAMES, FunnelTokenizer, FunnelTokenizerFast
|
||||||
|
|
||||||
|
from .test_tokenization_common import TokenizerTesterMixin
|
||||||
|
|
||||||
|
|
||||||
|
class FunnelTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
|
||||||
|
|
||||||
|
tokenizer_class = FunnelTokenizer
|
||||||
|
test_rust_tokenizer = True
|
||||||
|
|
||||||
|
def setUp(self):
|
||||||
|
super().setUp()
|
||||||
|
|
||||||
|
vocab_tokens = [
|
||||||
|
"<unk>",
|
||||||
|
"<cls>",
|
||||||
|
"<sep>",
|
||||||
|
"want",
|
||||||
|
"##want",
|
||||||
|
"##ed",
|
||||||
|
"wa",
|
||||||
|
"un",
|
||||||
|
"runn",
|
||||||
|
"##ing",
|
||||||
|
",",
|
||||||
|
"low",
|
||||||
|
"lowest",
|
||||||
|
]
|
||||||
|
self.vocab_file = os.path.join(self.tmpdirname, VOCAB_FILES_NAMES["vocab_file"])
|
||||||
|
with open(self.vocab_file, "w", encoding="utf-8") as vocab_writer:
|
||||||
|
vocab_writer.write("".join([x + "\n" for x in vocab_tokens]))
|
||||||
|
|
||||||
|
def get_tokenizer(self, **kwargs):
|
||||||
|
return FunnelTokenizer.from_pretrained(self.tmpdirname, **kwargs)
|
||||||
|
|
||||||
|
def get_rust_tokenizer(self, **kwargs):
|
||||||
|
return FunnelTokenizerFast.from_pretrained(self.tmpdirname, **kwargs)
|
||||||
|
|
||||||
|
def get_input_output_texts(self, tokenizer):
|
||||||
|
input_text = "UNwant\u00E9d,running"
|
||||||
|
output_text = "unwanted, running"
|
||||||
|
return input_text, output_text
|
||||||
|
|
||||||
|
def test_full_tokenizer(self):
|
||||||
|
tokenizer = self.tokenizer_class(self.vocab_file)
|
||||||
|
|
||||||
|
tokens = tokenizer.tokenize("UNwant\u00E9d,running")
|
||||||
|
self.assertListEqual(tokens, ["un", "##want", "##ed", ",", "runn", "##ing"])
|
||||||
|
self.assertListEqual(tokenizer.convert_tokens_to_ids(tokens), [7, 4, 5, 10, 8, 9])
|
||||||
|
|
||||||
|
def test_token_type_ids(self):
|
||||||
|
tokenizers = self.get_tokenizers(do_lower_case=False)
|
||||||
|
for tokenizer in tokenizers:
|
||||||
|
inputs = tokenizer("UNwant\u00E9d,running")
|
||||||
|
sentence_len = len(inputs["input_ids"]) - 1
|
||||||
|
self.assertListEqual(inputs["token_type_ids"], [2] + [0] * sentence_len)
|
||||||
|
|
||||||
|
inputs = tokenizer("UNwant\u00E9d,running", "UNwant\u00E9d,running")
|
||||||
|
self.assertListEqual(inputs["token_type_ids"], [2] + [0] * sentence_len + [1] * sentence_len)
|
||||||
@@ -141,18 +141,20 @@ def get_model_doc_files():
|
|||||||
# for the all_model_classes variable.
|
# for the all_model_classes variable.
|
||||||
def find_tested_models(test_file):
|
def find_tested_models(test_file):
|
||||||
""" Parse the content of test_file to detect what's in all_model_classes"""
|
""" Parse the content of test_file to detect what's in all_model_classes"""
|
||||||
|
# This is a bit hacky but I didn't find a way to import the test_file as a module and read inside the class
|
||||||
with open(os.path.join(PATH_TO_TESTS, test_file)) as f:
|
with open(os.path.join(PATH_TO_TESTS, test_file)) as f:
|
||||||
content = f.read()
|
content = f.read()
|
||||||
all_models = re.search(r"all_model_classes\s+=\s+\(\s*\(([^\)]*)\)", content)
|
all_models = re.findall(r"all_model_classes\s+=\s+\(\s*\(([^\)]*)\)", content)
|
||||||
# Check with one less parenthesis
|
# Check with one less parenthesis
|
||||||
if all_models is None:
|
if len(all_models) == 0:
|
||||||
all_models = re.search(r"all_model_classes\s+=\s+\(([^\)]*)\)", content)
|
all_models = re.findall(r"all_model_classes\s+=\s+\(([^\)]*)\)", content)
|
||||||
if all_models is not None:
|
if len(all_models) > 0:
|
||||||
model_tested = []
|
model_tested = []
|
||||||
for line in all_models.groups()[0].split(","):
|
for entry in all_models:
|
||||||
name = line.strip()
|
for line in entry.split(","):
|
||||||
if len(name) > 0:
|
name = line.strip()
|
||||||
model_tested.append(name)
|
if len(name) > 0:
|
||||||
|
model_tested.append(name)
|
||||||
return model_tested
|
return model_tested
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user