Fix all sphynx warnings (#5068)
This commit is contained in:
@@ -17,7 +17,6 @@ The ``.optimization`` module provides:
|
|||||||
~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.AdamWeightDecay
|
.. autoclass:: transformers.AdamWeightDecay
|
||||||
:members:
|
|
||||||
|
|
||||||
.. autofunction:: transformers.create_optimizer
|
.. autofunction:: transformers.create_optimizer
|
||||||
|
|
||||||
|
|||||||
@@ -7,7 +7,7 @@ Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction an
|
|||||||
|
|
||||||
There are two categories of pipeline abstractions to be aware about:
|
There are two categories of pipeline abstractions to be aware about:
|
||||||
|
|
||||||
- The :class:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines
|
- The :func:`~transformers.pipeline` which is the most powerful object encapsulating all other pipelines
|
||||||
- The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline`
|
- The other task-specific pipelines, such as :class:`~transformers.TokenClassificationPipeline`
|
||||||
or :class:`~transformers.QuestionAnsweringPipeline`
|
or :class:`~transformers.QuestionAnsweringPipeline`
|
||||||
|
|
||||||
@@ -17,8 +17,7 @@ The pipeline abstraction
|
|||||||
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any
|
The `pipeline` abstraction is a wrapper around all the other available pipelines. It is instantiated as any
|
||||||
other pipeline but requires an additional argument which is the `task`.
|
other pipeline but requires an additional argument which is the `task`.
|
||||||
|
|
||||||
.. autoclass:: transformers.pipeline
|
... autofunction:: transformers.pipeline
|
||||||
:members:
|
|
||||||
|
|
||||||
|
|
||||||
The task specific pipelines
|
The task specific pipelines
|
||||||
|
|||||||
@@ -30,35 +30,35 @@ Instantiating one of ``AutoModel``, ``AutoConfig`` and ``AutoTokenizer`` will di
|
|||||||
|
|
||||||
|
|
||||||
``AutoModelForPreTraining``
|
``AutoModelForPreTraining``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.AutoModelForPreTraining
|
.. autoclass:: transformers.AutoModelForPreTraining
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``AutoModelWithLMHead``
|
``AutoModelWithLMHead``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.AutoModelWithLMHead
|
.. autoclass:: transformers.AutoModelWithLMHead
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``AutoModelForSequenceClassification``
|
``AutoModelForSequenceClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.AutoModelForSequenceClassification
|
.. autoclass:: transformers.AutoModelForSequenceClassification
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``AutoModelForQuestionAnswering``
|
``AutoModelForQuestionAnswering``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.AutoModelForQuestionAnswering
|
.. autoclass:: transformers.AutoModelForQuestionAnswering
|
||||||
:members:
|
:members:
|
||||||
|
|
||||||
|
|
||||||
``AutoModelForTokenClassification``
|
``AutoModelForTokenClassification``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.AutoModelForTokenClassification
|
.. autoclass:: transformers.AutoModelForTokenClassification
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -1,5 +1,5 @@
|
|||||||
Encoder Decoder Models
|
Encoder Decoder Models
|
||||||
-----------
|
------------------------
|
||||||
|
|
||||||
This class can wrap an encoder model, such as ``BertModel`` and a decoder modeling with a language modeling head, such as ``BertForMaskedLM`` into a encoder-decoder model.
|
This class can wrap an encoder model, such as ``BertModel`` and a decoder modeling with a language modeling head, such as ``BertForMaskedLM`` into a encoder-decoder model.
|
||||||
|
|
||||||
@@ -10,7 +10,7 @@ An application of this architecture could be *summarization* using two pretraine
|
|||||||
|
|
||||||
|
|
||||||
``EncoderDecoderConfig``
|
``EncoderDecoderConfig``
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
.. autoclass:: transformers.EncoderDecoderConfig
|
.. autoclass:: transformers.EncoderDecoderConfig
|
||||||
:members:
|
:members:
|
||||||
|
|||||||
@@ -4,7 +4,7 @@ Reformer
|
|||||||
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
|
file a `Github Issue <https://github.com/huggingface/transformers/issues/new?assignees=&labels=&template=bug-report.md&title>`_
|
||||||
|
|
||||||
Overview
|
Overview
|
||||||
~~~~~
|
~~~~~~~~~~
|
||||||
The Reformer model was presented in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451.pdf>`_ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
The Reformer model was presented in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.04451.pdf>`_ by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||||
Here the abstract:
|
Here the abstract:
|
||||||
|
|
||||||
@@ -13,7 +13,7 @@ Here the abstract:
|
|||||||
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`_ .
|
The Authors' code can be found `here <https://github.com/google/trax/tree/master/trax/models/reformer>`_ .
|
||||||
|
|
||||||
Axial Positional Encodings
|
Axial Positional Encodings
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
Axial Positional Encodings were first implemented in Google's `trax library <https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`_ and developed by the authors of this model's paper. In models that are treating very long input sequences, the conventional position id encodings store an embedings vector of size :math:`d` being the ``config.hidden_size`` for every position :math:`i, \ldots, n_s`, with :math:`n_s` being ``config.max_embedding_size``. *E.g.*, having a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000` would result in a position encoding matrix:
|
Axial Positional Encodings were first implemented in Google's `trax library <https://github.com/google/trax/blob/4d99ad4965bab1deba227539758d59f0df0fef48/trax/layers/research/position_encodings.py#L29>`_ and developed by the authors of this model's paper. In models that are treating very long input sequences, the conventional position id encodings store an embedings vector of size :math:`d` being the ``config.hidden_size`` for every position :math:`i, \ldots, n_s`, with :math:`n_s` being ``config.max_embedding_size``. *E.g.*, having a sequence length of :math:`n_s = 2^{19} \approx 0.5M` and a ``config.hidden_size`` of :math:`d = 2^{10} \approx 1000` would result in a position encoding matrix:
|
||||||
|
|
||||||
.. math::
|
.. math::
|
||||||
|
|||||||
@@ -22,10 +22,12 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
|
|||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-multilingual-uncased`` | | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias |
|
| | | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-multilingual-cased`` | | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased text in the top 104 languages with the largest Wikipedias |
|
| | | | Trained on cased text in the top 104 languages with the largest Wikipedias |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
| | | (see `details <https://github.com/google-research/bert/blob/master/multilingual.md>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-chinese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
@@ -33,64 +35,79 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
|
|||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-german-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased German text by Deepset.ai |
|
| | | | Trained on cased German text by Deepset.ai |
|
||||||
|
| | | |
|
||||||
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`__). |
|
| | | (see `details on deepset.ai website <https://deepset.ai/german-bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-uncased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | Trained on lower-cased English text using Whole-Word-Masking |
|
| | | | Trained on lower-cased English text using Whole-Word-Masking |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-cased-whole-word-masking`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | Trained on cased English text using Whole-Word-Masking |
|
| | | | Trained on cased English text using Whole-Word-Masking |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
| | | (see `details <https://github.com/google-research/bert/#bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters. |
|
||||||
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
|
| | | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||||
|
| | | |
|
||||||
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
|
| | | (see details of fine-tuning in the `example section <https://github.com/huggingface/transformers/tree/master/examples>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | | 24-layer, 1024-hidden, 16-heads, 340M parameters |
|
||||||
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
|
| | | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD |
|
||||||
|
| | | |
|
||||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-cased-finetuned-mrpc`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
|
| | | | The ``bert-base-cased`` model fine-tuned on MRPC |
|
||||||
|
| | | |
|
||||||
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
| | | (see `details of fine-tuning in the example section <https://huggingface.co/transformers/examples.html>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-german-dbmdz-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-german-dbmdz-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased German text by DBMDZ |
|
| | | | Trained on cased German text by DBMDZ |
|
||||||
|
| | | |
|
||||||
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``bert-base-german-dbmdz-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``bert-base-german-dbmdz-uncased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on uncased German text by DBMDZ |
|
| | | | Trained on uncased German text by DBMDZ |
|
||||||
|
| | | |
|
||||||
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
| | | (see `details on dbmdz repository <https://github.com/dbmdz/german-bert>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece. |
|
| | | | Trained on Japanese text. Text is tokenized with MeCab and WordPiece. |
|
||||||
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
|
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
|
||||||
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized with MeCab and WordPiece. |
|
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized with MeCab and WordPiece. |
|
||||||
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
|
| | | | `MeCab <https://taku910.github.io/mecab/>`__ is required for tokenization. |
|
||||||
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese-char`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text. Text is tokenized into characters. |
|
| | | | Trained on Japanese text. Text is tokenized into characters. |
|
||||||
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``cl-tohoku/bert-base-japanese-char-whole-word-masking`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. |
|
| | | | Trained on Japanese text using Whole-Word-Masking. Text is tokenized into characters. |
|
||||||
|
| | | |
|
||||||
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
| | | (see `details on cl-tohoku repository <https://github.com/cl-tohoku/bert-japanese>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``TurkuNLP/bert-base-finnish-cased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased Finnish text. |
|
| | | | Trained on cased Finnish text. |
|
||||||
|
| | | |
|
||||||
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``TurkuNLP/bert-base-finnish-uncased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``TurkuNLP/bert-base-finnish-uncased-v1`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on uncased Finnish text. |
|
| | | | Trained on uncased Finnish text. |
|
||||||
|
| | | |
|
||||||
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
| | | (see `details on turkunlp.org <http://turkunlp.org/FinBERT/>`__). |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``wietsedv/bert-base-dutch-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| | ``wietsedv/bert-base-dutch-cased`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
| | | | Trained on cased Dutch text. |
|
| | | | Trained on cased Dutch text. |
|
||||||
|
| | | |
|
||||||
| | | (see `details on wietsedv repository <https://github.com/wietsedv/bertje/>`__). |
|
| | | (see `details on wietsedv repository <https://github.com/wietsedv/bertje/>`__). |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
| GPT | ``openai-gpt`` | | 12-layer, 768-hidden, 12-heads, 110M parameters. |
|
||||||
@@ -149,54 +166,67 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
|
|||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
| RoBERTa | ``roberta-base`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
||||||
| | | | RoBERTa using the BERT-base architecture |
|
| | | | RoBERTa using the BERT-base architecture |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-large`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
| | ``roberta-large`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
||||||
| | | | RoBERTa using the BERT-large architecture |
|
| | | | RoBERTa using the BERT-large architecture |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-large-mnli`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
| | ``roberta-large-mnli`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
||||||
| | | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__. |
|
| | | | ``roberta-large`` fine-tuned on `MNLI <http://www.nyu.edu/projects/bowman/multinli/>`__. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/roberta>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilroberta-base`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
| | ``distilroberta-base`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
||||||
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
|
| | | | The DistilRoBERTa model distilled from the RoBERTa model `roberta-base` checkpoint. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-base-openai-detector`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
| | ``roberta-base-openai-detector`` | | 12-layer, 768-hidden, 12-heads, 125M parameters |
|
||||||
| | | | ``roberta-base`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
| | | | ``roberta-base`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``roberta-large-openai-detector`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
| | ``roberta-large-openai-detector`` | | 24-layer, 1024-hidden, 16-heads, 355M parameters |
|
||||||
| | | | ``roberta-large`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
| | | | ``roberta-large`` fine-tuned by OpenAI on the outputs of the 1.5B-parameter GPT-2 model. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
| | | (see `details <https://github.com/openai/gpt-2-output-dataset/tree/master/detector>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
| DistilBERT | ``distilbert-base-uncased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
| | ``distilbert-base-uncased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-uncased` checkpoint, with an additional linear layer. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-cased`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
| | ``distilbert-base-cased`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-cased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
| | ``distilbert-base-cased-distilled-squad`` | | 6-layer, 768-hidden, 12-heads, 65M parameters |
|
||||||
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint, with an additional question answering layer. |
|
| | | | The DistilBERT model distilled from the BERT model `bert-base-cased` checkpoint, with an additional question answering layer. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
| | ``distilgpt2`` | | 6-layer, 768-hidden, 12-heads, 82M parameters |
|
||||||
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
|
| | | | The DistilGPT2 model distilled from the GPT2 model `gpt2` checkpoint. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-german-cased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
| | ``distilbert-base-german-cased`` | | 6-layer, 768-hidden, 12-heads, 66M parameters |
|
||||||
| | | | The German DistilBERT model distilled from the German DBMDZ BERT model `bert-base-german-dbmdz-cased` checkpoint. |
|
| | | | The German DistilBERT model distilled from the German DBMDZ BERT model `bert-base-german-dbmdz-cased` checkpoint. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``distilbert-base-multilingual-cased`` | | 6-layer, 768-hidden, 12-heads, 134M parameters |
|
| | ``distilbert-base-multilingual-cased`` | | 6-layer, 768-hidden, 12-heads, 134M parameters |
|
||||||
| | | | The multilingual DistilBERT model distilled from the Multilingual BERT model `bert-base-multilingual-cased` checkpoint. |
|
| | | | The multilingual DistilBERT model distilled from the Multilingual BERT model `bert-base-multilingual-cased` checkpoint. |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
| | | (see `details <https://github.com/huggingface/transformers/tree/master/examples/distillation>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
|
| CTRL | ``ctrl`` | | 48-layer, 1280-hidden, 16-heads, 1.6B parameters |
|
||||||
@@ -204,38 +234,47 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
|
|||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| CamemBERT | ``camembert-base`` | | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
| CamemBERT | ``camembert-base`` | | 12-layer, 768-hidden, 12-heads, 110M parameters |
|
||||||
| | | | CamemBERT using the BERT-base architecture |
|
| | | | CamemBERT using the BERT-base architecture |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/camembert>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
| ALBERT | ``albert-base-v1`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
||||||
| | | | ALBERT base model |
|
| | | | ALBERT base model |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
| | ``albert-large-v1`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
||||||
| | | | ALBERT large model |
|
| | | | ALBERT large model |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
| | ``albert-xlarge-v1`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
||||||
| | | | ALBERT xlarge model |
|
| | | | ALBERT xlarge model |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
| | ``albert-xxlarge-v1`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
||||||
| | | | ALBERT xxlarge model |
|
| | | | ALBERT xxlarge model |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
| | ``albert-base-v2`` | | 12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters |
|
||||||
| | | | ALBERT base model with no dropout, additional training data and longer training |
|
| | | | ALBERT base model with no dropout, additional training data and longer training |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
| | ``albert-large-v2`` | | 24 repeating layers, 128 embedding, 1024-hidden, 16-heads, 17M parameters |
|
||||||
| | | | ALBERT large model with no dropout, additional training data and longer training |
|
| | | | ALBERT large model with no dropout, additional training data and longer training |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
| | ``albert-xlarge-v2`` | | 24 repeating layers, 128 embedding, 2048-hidden, 16-heads, 58M parameters |
|
||||||
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
|
| | | | ALBERT xlarge model with no dropout, additional training data and longer training |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
| | ``albert-xxlarge-v2`` | | 12 repeating layer, 128 embedding, 4096-hidden, 64-heads, 223M parameters |
|
||||||
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
|
| | | | ALBERT xxlarge model with no dropout, additional training data and longer training |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
| | | (see `details <https://github.com/google-research/ALBERT>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
|
| T5 | ``t5-small`` | | ~60M parameters with 6-layers, 512-hidden-state, 2048 feed-forward hidden-state, 8-heads, |
|
||||||
@@ -261,21 +300,26 @@ For a list that includes community-uploaded models, refer to `https://huggingfac
|
|||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
|
| FlauBERT | ``flaubert/flaubert_small_cased`` | | 6-layer, 512-hidden, 8-heads, 54M parameters |
|
||||||
| | | | FlauBERT small architecture |
|
| | | | FlauBERT small architecture |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``flaubert/flaubert_base_uncased`` | | 12-layer, 768-hidden, 12-heads, 137M parameters |
|
| | ``flaubert/flaubert_base_uncased`` | | 12-layer, 768-hidden, 12-heads, 137M parameters |
|
||||||
| | | | FlauBERT base architecture with uncased vocabulary |
|
| | | | FlauBERT base architecture with uncased vocabulary |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``flaubert/flaubert_base_cased`` | | 12-layer, 768-hidden, 12-heads, 138M parameters |
|
| | ``flaubert/flaubert_base_cased`` | | 12-layer, 768-hidden, 12-heads, 138M parameters |
|
||||||
| | | | FlauBERT base architecture with cased vocabulary |
|
| | | | FlauBERT base architecture with cased vocabulary |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``flaubert/flaubert_large_cased`` | | 24-layer, 1024-hidden, 16-heads, 373M parameters |
|
| | ``flaubert/flaubert_large_cased`` | | 24-layer, 1024-hidden, 16-heads, 373M parameters |
|
||||||
| | | | FlauBERT large architecture |
|
| | | | FlauBERT large architecture |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
| | | (see `details <https://github.com/getalp/Flaubert>`__) |
|
||||||
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
+-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| Bart | ``facebook/bart-large`` | | 24-layer, 1024-hidden, 16-heads, 406M parameters |
|
| Bart | ``facebook/bart-large`` | | 24-layer, 1024-hidden, 16-heads, 406M parameters |
|
||||||
|
| | | |
|
||||||
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
|
| | | (see `details <https://github.com/pytorch/fairseq/tree/master/examples/bart>`_) |
|
||||||
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------+
|
||||||
| | ``facebook/bart-base`` | | 12-layer, 768-hidden, 16-heads, 139M parameters |
|
| | ``facebook/bart-base`` | | 12-layer, 768-hidden, 16-heads, 139M parameters |
|
||||||
|
|||||||
@@ -693,6 +693,7 @@ following array should be the output:
|
|||||||
::
|
::
|
||||||
|
|
||||||
[('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')]
|
[('[CLS]', 'O'), ('Hu', 'I-ORG'), ('##gging', 'I-ORG'), ('Face', 'I-ORG'), ('Inc', 'I-ORG'), ('.', 'O'), ('is', 'O'), ('a', 'O'), ('company', 'O'), ('based', 'O'), ('in', 'O'), ('New', 'I-LOC'), ('York', 'I-LOC'), ('City', 'I-LOC'), ('.', 'O'), ('Its', 'O'), ('headquarters', 'O'), ('are', 'O'), ('in', 'O'), ('D', 'I-LOC'), ('##UM', 'I-LOC'), ('##BO', 'I-LOC'), (',', 'O'), ('therefore', 'O'), ('very', 'O'), ('##c', 'O'), ('##lose', 'O'), ('to', 'O'), ('the', 'O'), ('Manhattan', 'I-LOC'), ('Bridge', 'I-LOC'), ('.', 'O'), ('[SEP]', 'O')]
|
||||||
|
|
||||||
Summarization
|
Summarization
|
||||||
----------------------------------------------------
|
----------------------------------------------------
|
||||||
|
|
||||||
@@ -770,6 +771,7 @@ Here Google`s T5 model is used that was only pre-trained on a multi-task mixed d
|
|||||||
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="tf", max_length=512)
|
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="tf", max_length=512)
|
||||||
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
|
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
|
||||||
print(outputs)
|
print(outputs)
|
||||||
|
|
||||||
Translation
|
Translation
|
||||||
----------------------------------------------------
|
----------------------------------------------------
|
||||||
|
|
||||||
|
|||||||
@@ -134,6 +134,7 @@ class AutoConfig:
|
|||||||
The configuration class to instantiate is selected
|
The configuration class to instantiate is selected
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `t5`: :class:`~transformers.T5Config` (T5 model)
|
- `t5`: :class:`~transformers.T5Config` (T5 model)
|
||||||
- `distilbert`: :class:`~transformers.DistilBertConfig` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertConfig` (DistilBERT model)
|
||||||
- `albert`: :class:`~transformers.AlbertConfig` (ALBERT model)
|
- `albert`: :class:`~transformers.AlbertConfig` (ALBERT model)
|
||||||
|
|||||||
@@ -53,7 +53,7 @@ class T5Config(PretrainedConfig):
|
|||||||
probabilities.
|
probabilities.
|
||||||
n_positions: The maximum sequence length that this model might
|
n_positions: The maximum sequence length that this model might
|
||||||
ever be used with. Typically set this to something large just in case
|
ever be used with. Typically set this to something large just in case
|
||||||
(e.g., 512 or 1024 or 2048). `n_positions` can also be accessed via the property `max_position_embeddings'.
|
(e.g., 512 or 1024 or 2048). `n_positions` can also be accessed via the property `max_position_embeddings`.
|
||||||
type_vocab_size: The vocabulary size of the `token_type_ids` passed into
|
type_vocab_size: The vocabulary size of the `token_type_ids` passed into
|
||||||
`T5Model`.
|
`T5Model`.
|
||||||
initializer_factor: A factor for initializing all weight matrices (should be kept to 1.0, used for initialization testing).
|
initializer_factor: A factor for initializing all weight matrices (should be kept to 1.0, used for initialization testing).
|
||||||
|
|||||||
@@ -84,6 +84,7 @@ class XLNetConfig(PretrainedConfig):
|
|||||||
Argument used when doing sequence summary. Used in for the multiple choice head in
|
Argument used when doing sequence summary. Used in for the multiple choice head in
|
||||||
:class:transformers.XLNetForSequenceClassification` and :class:`~transformers.XLNetForMultipleChoice`.
|
:class:transformers.XLNetForSequenceClassification` and :class:`~transformers.XLNetForMultipleChoice`.
|
||||||
Is one of the following options:
|
Is one of the following options:
|
||||||
|
|
||||||
- 'last' => take the last token hidden state (like XLNet)
|
- 'last' => take the last token hidden state (like XLNet)
|
||||||
- 'first' => take the first token hidden state (like Bert)
|
- 'first' => take the first token hidden state (like Bert)
|
||||||
- 'mean' => take the mean of all tokens hidden states
|
- 'mean' => take the mean of all tokens hidden states
|
||||||
|
|||||||
@@ -83,7 +83,8 @@ class DataProcessor:
|
|||||||
"""Base class for data converters for sequence classification data sets."""
|
"""Base class for data converters for sequence classification data sets."""
|
||||||
|
|
||||||
def get_example_from_tensor_dict(self, tensor_dict):
|
def get_example_from_tensor_dict(self, tensor_dict):
|
||||||
"""Gets an example from a dict with tensorflow tensors
|
"""Gets an example from a dict with tensorflow tensors.
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
tensor_dict: Keys and values should match the corresponding Glue
|
tensor_dict: Keys and values should match the corresponding Glue
|
||||||
tensorflow_dataset examples.
|
tensorflow_dataset examples.
|
||||||
@@ -91,15 +92,15 @@ class DataProcessor:
|
|||||||
raise NotImplementedError()
|
raise NotImplementedError()
|
||||||
|
|
||||||
def get_train_examples(self, data_dir):
|
def get_train_examples(self, data_dir):
|
||||||
"""Gets a collection of `InputExample`s for the train set."""
|
"""Gets a collection of :class:`InputExample` for the train set."""
|
||||||
raise NotImplementedError()
|
raise NotImplementedError()
|
||||||
|
|
||||||
def get_dev_examples(self, data_dir):
|
def get_dev_examples(self, data_dir):
|
||||||
"""Gets a collection of `InputExample`s for the dev set."""
|
"""Gets a collection of :class:`InputExample` for the dev set."""
|
||||||
raise NotImplementedError()
|
raise NotImplementedError()
|
||||||
|
|
||||||
def get_test_examples(self, data_dir):
|
def get_test_examples(self, data_dir):
|
||||||
"""Gets a collection of `InputExample`s for the test set."""
|
"""Gets a collection of :class:`InputExample` for the test set."""
|
||||||
raise NotImplementedError()
|
raise NotImplementedError()
|
||||||
|
|
||||||
def get_labels(self):
|
def get_labels(self):
|
||||||
|
|||||||
@@ -393,6 +393,7 @@ class AutoModel:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `t5`: :class:`~transformers.T5Model` (T5 model)
|
- `t5`: :class:`~transformers.T5Model` (T5 model)
|
||||||
- `distilbert`: :class:`~transformers.DistilBertModel` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertModel` (DistilBERT model)
|
||||||
- `albert`: :class:`~transformers.AlbertModel` (ALBERT model)
|
- `albert`: :class:`~transformers.AlbertModel` (ALBERT model)
|
||||||
@@ -546,6 +547,7 @@ class AutoModelForPreTraining:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `t5`: :class:`~transformers.T5ModelWithLMHead` (T5 model)
|
- `t5`: :class:`~transformers.T5ModelWithLMHead` (T5 model)
|
||||||
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
|
||||||
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
|
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
|
||||||
@@ -698,6 +700,7 @@ class AutoModelWithLMHead:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model)
|
- `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model)
|
||||||
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
|
||||||
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
|
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
|
||||||
@@ -845,6 +848,7 @@ class AutoModelForCausalLM:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `bert`: :class:`~transformers.BertLMHeadModel` (Bert model)
|
- `bert`: :class:`~transformers.BertLMHeadModel` (Bert model)
|
||||||
- `openai-gpt`: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model)
|
- `openai-gpt`: :class:`~transformers.OpenAIGPTLMHeadModel` (OpenAI GPT model)
|
||||||
- `gpt2`: :class:`~transformers.GPT2LMHeadModel` (OpenAI GPT-2 model)
|
- `gpt2`: :class:`~transformers.GPT2LMHeadModel` (OpenAI GPT-2 model)
|
||||||
@@ -982,6 +986,7 @@ class AutoModelForMaskedLM:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertForMaskedLM` (DistilBERT model)
|
||||||
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
|
- `albert`: :class:`~transformers.AlbertForMaskedLM` (ALBERT model)
|
||||||
- `camembert`: :class:`~transformers.CamembertForMaskedLM` (CamemBERT model)
|
- `camembert`: :class:`~transformers.CamembertForMaskedLM` (CamemBERT model)
|
||||||
@@ -1118,6 +1123,7 @@ class AutoModelForSeq2SeqLM:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model)
|
- `t5`: :class:`~transformers.T5ForConditionalGeneration` (T5 model)
|
||||||
- `bart`: :class:`~transformers.BartForConditionalGeneration` (Bert model)
|
- `bart`: :class:`~transformers.BartForConditionalGeneration` (Bert model)
|
||||||
- `marian`: :class:`~transformers.MarianMTModel` (Marian model)
|
- `marian`: :class:`~transformers.MarianMTModel` (Marian model)
|
||||||
@@ -1256,6 +1262,7 @@ class AutoModelForSequenceClassification:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `distilbert`: :class:`~transformers.DistilBertForSequenceClassification` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertForSequenceClassification` (DistilBERT model)
|
||||||
- `albert`: :class:`~transformers.AlbertForSequenceClassification` (ALBERT model)
|
- `albert`: :class:`~transformers.AlbertForSequenceClassification` (ALBERT model)
|
||||||
- `camembert`: :class:`~transformers.CamembertForSequenceClassification` (CamemBERT model)
|
- `camembert`: :class:`~transformers.CamembertForSequenceClassification` (CamemBERT model)
|
||||||
@@ -1402,6 +1409,7 @@ class AutoModelForQuestionAnswering:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `distilbert`: :class:`~transformers.DistilBertForQuestionAnswering` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertForQuestionAnswering` (DistilBERT model)
|
||||||
- `albert`: :class:`~transformers.AlbertForQuestionAnswering` (ALBERT model)
|
- `albert`: :class:`~transformers.AlbertForQuestionAnswering` (ALBERT model)
|
||||||
- `bert`: :class:`~transformers.BertForQuestionAnswering` (Bert model)
|
- `bert`: :class:`~transformers.BertForQuestionAnswering` (Bert model)
|
||||||
@@ -1547,6 +1555,7 @@ class AutoModelForTokenClassification:
|
|||||||
The `from_pretrained()` method takes care of returning the correct model class instance
|
The `from_pretrained()` method takes care of returning the correct model class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `distilbert`: :class:`~transformers.DistilBertForTokenClassification` (DistilBERT model)
|
- `distilbert`: :class:`~transformers.DistilBertForTokenClassification` (DistilBERT model)
|
||||||
- `xlm`: :class:`~transformers.XLMForTokenClassification` (XLM model)
|
- `xlm`: :class:`~transformers.XLMForTokenClassification` (XLM model)
|
||||||
- `xlm-roberta`: :class:`~transformers.XLMRobertaForTokenClassification` (XLM-RoBERTa?Para model)
|
- `xlm-roberta`: :class:`~transformers.XLMRobertaForTokenClassification` (XLM-RoBERTa?Para model)
|
||||||
|
|||||||
@@ -745,9 +745,10 @@ class ElectraForTokenClassification(ElectraPreTrainedModel):
|
|||||||
|
|
||||||
|
|
||||||
@add_start_docstrings(
|
@add_start_docstrings(
|
||||||
"""ELECTRA Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of
|
"""
|
||||||
the hidden-states output to compute `span start logits` and `span end logits`). """,
|
ELECTRA Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
|
||||||
ELECTRA_INPUTS_DOCSTRING,
|
layers on top of the hidden-states output to compute `span start logits` and `span end logits`).""",
|
||||||
|
ELECTRA_START_DOCSTRING,
|
||||||
)
|
)
|
||||||
class ElectraForQuestionAnswering(ElectraPreTrainedModel):
|
class ElectraForQuestionAnswering(ElectraPreTrainedModel):
|
||||||
config_class = ElectraConfig
|
config_class = ElectraConfig
|
||||||
|
|||||||
@@ -435,7 +435,7 @@ class LongformerSelfAttention(nn.Module):
|
|||||||
|
|
||||||
LONGFORMER_START_DOCSTRING = r"""
|
LONGFORMER_START_DOCSTRING = r"""
|
||||||
|
|
||||||
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
|
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ sub-class.
|
||||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
||||||
usage and behavior.
|
usage and behavior.
|
||||||
|
|
||||||
@@ -467,7 +467,7 @@ LONGFORMER_INPUTS_DOCSTRING = r"""
|
|||||||
Tokens with global attention attends to all other tokens, and all other tokens attend to them. This is important for
|
Tokens with global attention attends to all other tokens, and all other tokens attend to them. This is important for
|
||||||
task-specific finetuning because it makes the model more flexible at representing the task. For example,
|
task-specific finetuning because it makes the model more flexible at representing the task. For example,
|
||||||
for classification, the <s> token should be given global attention. For QA, all question tokens should also have
|
for classification, the <s> token should be given global attention. For QA, all question tokens should also have
|
||||||
global attention. Please refer to the Longformer paper https://arxiv.org/abs/2004.05150 for more details.
|
global attention. Please refer to the `Longformer paper <https://arxiv.org/abs/2004.05150>`__ for more details.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
``0`` for local attention (a sliding window attention),
|
``0`` for local attention (a sliding window attention),
|
||||||
``1`` for global attention (tokens that attend to all other tokens, and all other tokens attend to them).
|
``1`` for global attention (tokens that attend to all other tokens, and all other tokens attend to them).
|
||||||
@@ -500,7 +500,7 @@ class LongformerModel(RobertaModel):
|
|||||||
"""
|
"""
|
||||||
This class overrides :class:`~transformers.RobertaModel` to provide the ability to process
|
This class overrides :class:`~transformers.RobertaModel` to provide the ability to process
|
||||||
long sequences following the selfattention approach described in `Longformer: the Long-Document Transformer
|
long sequences following the selfattention approach described in `Longformer: the Long-Document Transformer
|
||||||
<https://arxiv.org/abs/2004.05150>`_ by Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer selfattention
|
<https://arxiv.org/abs/2004.05150>`__ by Iz Beltagy, Matthew E. Peters, and Arman Cohan. Longformer selfattention
|
||||||
combines a local (sliding window) and global attention to extend to long documents without the O(n^2) increase in
|
combines a local (sliding window) and global attention to extend to long documents without the O(n^2) increase in
|
||||||
memory and compute.
|
memory and compute.
|
||||||
|
|
||||||
|
|||||||
@@ -1451,14 +1451,10 @@ class ReformerPreTrainedModel(PreTrainedModel):
|
|||||||
|
|
||||||
|
|
||||||
REFORMER_START_DOCSTRING = r"""
|
REFORMER_START_DOCSTRING = r"""
|
||||||
Reformer was proposed in
|
Reformer was proposed in `Reformer: The Efficient Transformer <https://arxiv.org/abs/2001.0445>`__
|
||||||
`Reformer: The Efficient Transformer`_
|
|
||||||
by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
by Nikita Kitaev, Łukasz Kaiser, Anselm Levskaya.
|
||||||
|
|
||||||
.. _`Reformer: The Efficient Transformer`:
|
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`__ sub-class.
|
||||||
https://arxiv.org/abs/2001.04451
|
|
||||||
|
|
||||||
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#torch.nn.Module>`_ sub-class.
|
|
||||||
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general
|
||||||
usage and behavior.
|
usage and behavior.
|
||||||
|
|
||||||
|
|||||||
@@ -775,19 +775,14 @@ class T5Stack(T5PreTrainedModel):
|
|||||||
return outputs # last-layer hidden state, (presents,) (all hidden states), (all attentions)
|
return outputs # last-layer hidden state, (presents,) (all hidden states), (all attentions)
|
||||||
|
|
||||||
|
|
||||||
T5_START_DOCSTRING = r""" The T5 model was proposed in
|
T5_START_DOCSTRING = r"""
|
||||||
`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`_
|
The T5 model was proposed in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
|
||||||
by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
|
<https://arxiv.org/abs/1910.10683>`__ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
|
||||||
|
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
|
||||||
It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting.
|
It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting.
|
||||||
|
|
||||||
This model is a PyTorch `torch.nn.Module`_ sub-class. Use it as a regular PyTorch Module and
|
This model is a PyTorch `torch.nn.Module <https://pytorch.org/docs/stable/nn.html#module>`__ sub-class. Use it as a
|
||||||
refer to the PyTorch documentation for all matter related to general usage and behavior.
|
regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage and behavior.
|
||||||
|
|
||||||
.. _`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`:
|
|
||||||
https://arxiv.org/abs/1910.10683
|
|
||||||
|
|
||||||
.. _`torch.nn.Module`:
|
|
||||||
https://pytorch.org/docs/stable/nn.html#module
|
|
||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
config (:class:`~transformers.T5Config`): Model configuration class with all the parameters of the model.
|
config (:class:`~transformers.T5Config`): Model configuration class with all the parameters of the model.
|
||||||
@@ -804,7 +799,7 @@ T5_INPUTS_DOCSTRING = r"""
|
|||||||
See :func:`transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
To know more on how to prepare :obj:`input_ids` for pre-training take a look at
|
To know more on how to prepare :obj:`input_ids` for pre-training take a look at
|
||||||
`T5 Training <./t5.html#training>`_ .
|
`T5 Training <./t5.html#training>`__.
|
||||||
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
|
attention_mask (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length)`, `optional`, defaults to :obj:`None`):
|
||||||
Mask to avoid performing attention on padding token indices.
|
Mask to avoid performing attention on padding token indices.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -817,7 +812,7 @@ T5_INPUTS_DOCSTRING = r"""
|
|||||||
Provide for sequence to sequence training. T5 uses the pad_token_id as the starting token for decoder_input_ids generation.
|
Provide for sequence to sequence training. T5 uses the pad_token_id as the starting token for decoder_input_ids generation.
|
||||||
If `decoder_past_key_value_states` is used, optionally only the last `decoder_input_ids` have to be input (see `decoder_past_key_value_states`).
|
If `decoder_past_key_value_states` is used, optionally only the last `decoder_input_ids` have to be input (see `decoder_past_key_value_states`).
|
||||||
To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at
|
To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at
|
||||||
`T5 Training <./t5.html#training>`_ .
|
`T5 Training <./t5.html#training>`__.
|
||||||
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, tgt_seq_len)`, `optional`, defaults to :obj:`None`):
|
decoder_attention_mask (:obj:`torch.BoolTensor` of shape :obj:`(batch_size, tgt_seq_len)`, `optional`, defaults to :obj:`None`):
|
||||||
Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
|
Default behavior: generate a tensor that ignores pad tokens in decoder_input_ids. Causal mask will also be used by default.
|
||||||
decoder_past_key_value_states (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
|
decoder_past_key_value_states (:obj:`tuple(tuple(torch.FloatTensor))` of length :obj:`config.n_layers` with each tuple having 4 tensors of shape :obj:`(batch_size, num_heads, sequence_length - 1, embed_size_per_head)`):
|
||||||
@@ -902,8 +897,8 @@ class T5Model(T5PreTrainedModel):
|
|||||||
output_attentions=None,
|
output_attentions=None,
|
||||||
):
|
):
|
||||||
r"""
|
r"""
|
||||||
Return:
|
Returns:
|
||||||
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs.
|
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
|
||||||
last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
|
last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
|
||||||
Sequence of hidden-states at the output of the last layer of the model.
|
Sequence of hidden-states at the output of the last layer of the model.
|
||||||
If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output.
|
If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output.
|
||||||
@@ -1038,7 +1033,7 @@ class T5ForConditionalGeneration(T5PreTrainedModel):
|
|||||||
Used to hide legacy arguments that have been deprecated.
|
Used to hide legacy arguments that have been deprecated.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs.
|
:obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
|
||||||
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided):
|
loss (:obj:`torch.FloatTensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`labels` is provided):
|
||||||
Classification loss (cross entropy).
|
Classification loss (cross entropy).
|
||||||
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
|
prediction_scores (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
|
||||||
|
|||||||
@@ -408,8 +408,7 @@ class TFElectraModel(TFElectraPreTrainedModel):
|
|||||||
|
|
||||||
|
|
||||||
@add_start_docstrings(
|
@add_start_docstrings(
|
||||||
"""
|
"""Electra model with a binary classification head on top as used during pre-training for identifying generated
|
||||||
Electra model with a binary classification head on top as used during pre-training for identifying generated
|
|
||||||
tokens.
|
tokens.
|
||||||
|
|
||||||
Even though both the discriminator and generator may be loaded into this model, the discriminator is
|
Even though both the discriminator and generator may be loaded into this model, the discriminator is
|
||||||
@@ -501,8 +500,7 @@ class TFElectraMaskedLMHead(tf.keras.layers.Layer):
|
|||||||
|
|
||||||
|
|
||||||
@add_start_docstrings(
|
@add_start_docstrings(
|
||||||
"""
|
"""Electra model with a language modeling head on top.
|
||||||
Electra model with a language modeling head on top.
|
|
||||||
|
|
||||||
Even though both the discriminator and generator may be loaded into this model, the generator is
|
Even though both the discriminator and generator may be loaded into this model, the generator is
|
||||||
the only model of the two to have been trained for the masked language modeling task.""",
|
the only model of the two to have been trained for the masked language modeling task.""",
|
||||||
@@ -588,8 +586,7 @@ class TFElectraForMaskedLM(TFElectraPreTrainedModel):
|
|||||||
|
|
||||||
|
|
||||||
@add_start_docstrings(
|
@add_start_docstrings(
|
||||||
"""
|
"""Electra model with a token classification head on top.
|
||||||
Electra model with a token classification head on top.
|
|
||||||
|
|
||||||
Both the discriminator and generator may be loaded into this model.""",
|
Both the discriminator and generator may be loaded into this model.""",
|
||||||
ELECTRA_START_DOCSTRING,
|
ELECTRA_START_DOCSTRING,
|
||||||
|
|||||||
@@ -772,19 +772,15 @@ class TFT5PreTrainedModel(TFPreTrainedModel):
|
|||||||
return dummy_inputs
|
return dummy_inputs
|
||||||
|
|
||||||
|
|
||||||
T5_START_DOCSTRING = r""" The T5 model was proposed in
|
T5_START_DOCSTRING = r"""
|
||||||
`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`_
|
The T5 model was proposed in `Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
|
||||||
by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
|
<https://arxiv.org/abs/1910.10683>`__ by Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang,
|
||||||
|
Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu.
|
||||||
It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting.
|
It's an encoder decoder transformer pre-trained in a text-to-text denoising generative setting.
|
||||||
|
|
||||||
This model is a tf.keras.Model `tf.keras.Model`_ sub-class. Use it as a regular TF 2.0 Keras Model and
|
This model is a `tf.keras.Model <https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Model>`__
|
||||||
refer to the TF 2.0 documentation for all matter related to general usage and behavior.
|
sub-class. Use it as a regular TF 2.0 Keras Model and refer to the TF 2.0 documentation for all matter related to
|
||||||
|
general usage and behavior.
|
||||||
.. _`Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer`:
|
|
||||||
https://arxiv.org/abs/1910.10683
|
|
||||||
|
|
||||||
.. _`tf.keras.Model`:
|
|
||||||
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/Model
|
|
||||||
|
|
||||||
Note on the model inputs:
|
Note on the model inputs:
|
||||||
TF 2.0 models accepts two formats as inputs:
|
TF 2.0 models accepts two formats as inputs:
|
||||||
@@ -796,7 +792,7 @@ T5_START_DOCSTRING = r""" The T5 model was proposed in
|
|||||||
|
|
||||||
If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
|
If you choose this second option, there are three possibilities you can use to gather all the input Tensors in the first positional argument :
|
||||||
|
|
||||||
- a single Tensor with inputs only and nothing else: `model(inputs_ids)
|
- a single Tensor with inputs only and nothing else: `model(inputs_ids)`
|
||||||
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
|
- a list of varying length with one or several input Tensors IN THE ORDER given in the docstring:
|
||||||
`model([inputs, attention_mask])` or `model([inputs, attention_mask, token_type_ids])`
|
`model([inputs, attention_mask])` or `model([inputs, attention_mask, token_type_ids])`
|
||||||
- a dictionary with one or several input Tensors associaed to the input names given in the docstring:
|
- a dictionary with one or several input Tensors associaed to the input names given in the docstring:
|
||||||
@@ -818,7 +814,7 @@ T5_INPUTS_DOCSTRING = r"""
|
|||||||
the right or the left.
|
the right or the left.
|
||||||
Indices can be obtained using :class:`transformers.T5Tokenizer`.
|
Indices can be obtained using :class:`transformers.T5Tokenizer`.
|
||||||
To know more on how to prepare :obj:`inputs` for pre-training take a look at
|
To know more on how to prepare :obj:`inputs` for pre-training take a look at
|
||||||
`T5 Training <./t5.html#training>`_ .
|
`T5 Training <./t5.html#training>`__.
|
||||||
See :func:`transformers.PreTrainedTokenizer.encode` and
|
See :func:`transformers.PreTrainedTokenizer.encode` and
|
||||||
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
:func:`transformers.PreTrainedTokenizer.convert_tokens_to_ids` for details.
|
||||||
decoder_input_ids (:obj:`tf.Tensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
|
decoder_input_ids (:obj:`tf.Tensor` of shape :obj:`(batch_size, target_sequence_length)`, `optional`, defaults to :obj:`None`):
|
||||||
@@ -850,7 +846,7 @@ T5_INPUTS_DOCSTRING = r"""
|
|||||||
This is useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors
|
This is useful if you want more control over how to convert `decoder_input_ids` indices into associated vectors
|
||||||
than the model's internal embedding lookup matrix.
|
than the model's internal embedding lookup matrix.
|
||||||
To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at
|
To know more on how to prepare :obj:`decoder_input_ids` for pre-training take a look at
|
||||||
`T5 Training <./t5.html#training>`_ .
|
`T5 Training <./t5.html#training>`__.
|
||||||
head_mask: (:obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`, defaults to :obj:`None`):
|
head_mask: (:obj:`tf.Tensor` of shape :obj:`(num_heads,)` or :obj:`(num_layers, num_heads)`, `optional`, defaults to :obj:`None`):
|
||||||
Mask to nullify selected heads of the self-attention modules.
|
Mask to nullify selected heads of the self-attention modules.
|
||||||
Mask values selected in ``[0, 1]``:
|
Mask values selected in ``[0, 1]``:
|
||||||
@@ -897,8 +893,8 @@ class TFT5Model(TFT5PreTrainedModel):
|
|||||||
@add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING)
|
@add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING)
|
||||||
def call(self, inputs, **kwargs):
|
def call(self, inputs, **kwargs):
|
||||||
r"""
|
r"""
|
||||||
Return:
|
Returns:
|
||||||
:obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs.
|
:obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
|
||||||
last_hidden_state (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
|
last_hidden_state (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
|
||||||
Sequence of hidden-states at the output of the last layer of the model.
|
Sequence of hidden-states at the output of the last layer of the model.
|
||||||
If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output.
|
If `decoder_past_key_value_states` is used only the last hidden-state of the sequences of shape :obj:`(batch_size, 1, hidden_size)` is output.
|
||||||
@@ -1024,8 +1020,8 @@ class TFT5ForConditionalGeneration(TFT5PreTrainedModel):
|
|||||||
@add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING)
|
@add_start_docstrings_to_callable(T5_INPUTS_DOCSTRING)
|
||||||
def call(self, inputs, **kwargs):
|
def call(self, inputs, **kwargs):
|
||||||
r"""
|
r"""
|
||||||
Return:
|
Returns:
|
||||||
:obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs.
|
:obj:`tuple(tf.Tensor)` comprising various elements depending on the configuration (:class:`~transformers.T5Config`) and inputs:
|
||||||
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`lm_label` is provided):
|
loss (:obj:`tf.Tensor` of shape :obj:`(1,)`, `optional`, returned when :obj:`lm_label` is provided):
|
||||||
Classification loss (cross entropy).
|
Classification loss (cross entropy).
|
||||||
prediction_scores (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
|
prediction_scores (:obj:`tf.Tensor` of shape :obj:`(batch_size, sequence_length, config.vocab_size)`)
|
||||||
|
|||||||
@@ -294,7 +294,6 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
|
|||||||
|
|
||||||
Parameters:
|
Parameters:
|
||||||
pretrained_model_name_or_path: either:
|
pretrained_model_name_or_path: either:
|
||||||
|
|
||||||
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
- a string with the `shortcut name` of a pre-trained model to load from cache or download, e.g.: ``bert-base-uncased``.
|
||||||
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
|
- a string with the `identifier name` of a pre-trained model that was user-uploaded to our S3, e.g.: ``dbmdz/bert-base-german-cased``.
|
||||||
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
- a path to a `directory` containing model weights saved using :func:`~transformers.PreTrainedModel.save_pretrained`, e.g.: ``./my_model_directory/``.
|
||||||
@@ -306,8 +305,8 @@ class TFPreTrainedModel(tf.keras.Model, TFModelUtilsMixin):
|
|||||||
config: (`optional`) one of:
|
config: (`optional`) one of:
|
||||||
- an instance of a class derived from :class:`~transformers.PretrainedConfig`, or
|
- an instance of a class derived from :class:`~transformers.PretrainedConfig`, or
|
||||||
- a string valid as input to :func:`~transformers.PretrainedConfig.from_pretrained()`
|
- a string valid as input to :func:`~transformers.PretrainedConfig.from_pretrained()`
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
|
||||||
|
|
||||||
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
- the model is loaded by suppling a local directory as ``pretrained_model_name_or_path`` and a configuration JSON file named `config.json` is found in the directory.
|
||||||
|
|||||||
@@ -530,6 +530,7 @@ class PreTrainedModel(nn.Module, ModuleUtilsMixin):
|
|||||||
config: (`optional`) one of:
|
config: (`optional`) one of:
|
||||||
- an instance of a class derived from :class:`~transformers.PretrainedConfig`, or
|
- an instance of a class derived from :class:`~transformers.PretrainedConfig`, or
|
||||||
- a string valid as input to :func:`~transformers.PretrainedConfig.from_pretrained()`
|
- a string valid as input to :func:`~transformers.PretrainedConfig.from_pretrained()`
|
||||||
|
|
||||||
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
Configuration for the model to use instead of an automatically loaded configuation. Configuration can be automatically loaded when:
|
||||||
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
- the model is a model provided by the library (loaded with the ``shortcut-name`` string of a pretrained model), or
|
||||||
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
- the model was saved using :func:`~transformers.PreTrainedModel.save_pretrained` and is reloaded by suppling the save directory.
|
||||||
|
|||||||
@@ -323,6 +323,7 @@ class Pipeline(_ScikitCompat):
|
|||||||
|
|
||||||
Base class implementing pipelined operations.
|
Base class implementing pipelined operations.
|
||||||
Pipeline workflow is defined as a sequence of the following operations:
|
Pipeline workflow is defined as a sequence of the following operations:
|
||||||
|
|
||||||
Input -> Tokenization -> Model Inference -> Post-Processing (Task dependent) -> Output
|
Input -> Tokenization -> Model Inference -> Post-Processing (Task dependent) -> Output
|
||||||
|
|
||||||
Pipeline supports running on CPU or GPU through the device argument. Users can specify
|
Pipeline supports running on CPU or GPU through the device argument. Users can specify
|
||||||
|
|||||||
@@ -103,6 +103,7 @@ class AutoTokenizer:
|
|||||||
The `from_pretrained()` method takes care of returning the correct tokenizer class instance
|
The `from_pretrained()` method takes care of returning the correct tokenizer class instance
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `t5`: T5Tokenizer (T5 model)
|
- `t5`: T5Tokenizer (T5 model)
|
||||||
- `distilbert`: DistilBertTokenizer (DistilBert model)
|
- `distilbert`: DistilBertTokenizer (DistilBert model)
|
||||||
- `albert`: AlbertTokenizer (ALBERT model)
|
- `albert`: AlbertTokenizer (ALBERT model)
|
||||||
@@ -136,6 +137,7 @@ class AutoTokenizer:
|
|||||||
The tokenizer class to instantiate is selected
|
The tokenizer class to instantiate is selected
|
||||||
based on the `model_type` property of the config object, or when it's missing,
|
based on the `model_type` property of the config object, or when it's missing,
|
||||||
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
falling back to using pattern matching on the `pretrained_model_name_or_path` string:
|
||||||
|
|
||||||
- `t5`: T5Tokenizer (T5 model)
|
- `t5`: T5Tokenizer (T5 model)
|
||||||
- `distilbert`: DistilBertTokenizer (DistilBert model)
|
- `distilbert`: DistilBertTokenizer (DistilBert model)
|
||||||
- `albert`: AlbertTokenizer (ALBERT model)
|
- `albert`: AlbertTokenizer (ALBERT model)
|
||||||
|
|||||||
Reference in New Issue
Block a user