diff --git a/docs/source/pretrained_models.rst b/docs/source/pretrained_models.rst index 2d72977951..b23a96ff7c 100644 --- a/docs/source/pretrained_models.rst +++ b/docs/source/pretrained_models.rst @@ -3,57 +3,98 @@ Pretrained models Here is the full list of the currently provided pretrained models together with a short presentation of each model. -+===============+============================================================+===========================+ -| Architecture | Shortcut name | Details of the model | -+===============+============================================================+===========================+ -| | ``bert-base-uncased`` | 12-layer, 768-hidden, 12-heads, 110M parameters -| | | Trained on lower-cased English text | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-large-uncased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters -| | | Trained on lower-cased English text | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-base-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters -| | | Trained on cased English text | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-large-cased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | -| | | Trained on cased English text | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-base-multilingual-uncased`` | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters -| | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias -| | | (see `details `_) | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-base-multilingual-cased`` | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters | -| | | Trained on cased text in the top 104 languages with the largest Wikipedias -| | | (see `details `_) | -| +------------------------------------------------------------+---------------------------+ -| BERT | ``bert-base-chinese`` | 12-layer, 768-hidden, 12-heads, 110M parameters | -| | | Trained on cased Chinese Simplified and Traditional text | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-base-german-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters | -| | | Trained on cased German text by Deepset.ai | -| | | (see `details on deepset.ai website `_) | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-large-uncased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | -| | | Trained on lower-cased English text using Whole-Word-Masking | -| | | (see `details `_) | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-large-cased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | -| | | Trained on cased English text using Whole-Word-Masking | -| | | (see `details `_) | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | -| | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD | -| | | (see details of fine-tuning in the `example section`_) | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | -| | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD | -| | | (see `details of fine-tuning in the example section `_) | -| +------------------------------------------------------------+---------------------------+ -| | ``bert-base-cased-finetuned-mrpc`` | 12-layer, 768-hidden, 12-heads, 110M parameters | -| | | The ``bert-base-cased`` model fine-tuned on MRPC | -| | | (see `details of fine-tuning in the example section `_) | -+---------------+------------------------------------------------------------+---------------------------+ -| GPT | Cells may span columns. | -+---------------+----------------------------------------------------------------------------------------+ -.. `_ \ No newline at end of file ++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| Architecture | Shortcut name | Details of the model | ++===================+============================================================+===========================================================================================================================+ +| BERT | ``bert-base-uncased`` | 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | Trained on lower-cased English text | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-large-uncased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | | Trained on lower-cased English text | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-base-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | Trained on cased English text | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-large-cased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | | Trained on cased English text | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-base-multilingual-uncased`` | (Original, not recommended) 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | Trained on lower-cased text in the top 102 languages with the largest Wikipedias | +| | | (see `details `__) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-base-multilingual-cased`` | (New, **recommended**) 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | Trained on cased text in the top 104 languages with the largest Wikipedias | +| | | (see `details `__) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-base-chinese`` | 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | Trained on cased Chinese Simplified and Traditional text | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-base-german-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | Trained on cased German text by Deepset.ai | +| | | (see `details on deepset.ai website `__) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-large-uncased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | | Trained on lower-cased English text using Whole-Word-Masking | +| | | (see `details `__) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-large-cased-whole-word-masking`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | | Trained on cased English text using Whole-Word-Masking | +| | | (see `details `__) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-large-uncased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | | The ``bert-large-uncased-whole-word-masking`` model fine-tuned on SQuAD (see details of fine-tuning in the | +| | | `example section `__) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-large-cased-whole-word-masking-finetuned-squad`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | | The ``bert-large-cased-whole-word-masking`` model fine-tuned on SQuAD | +| | | (see `details of fine-tuning in the example section `__) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``bert-base-cased-finetuned-mrpc`` | 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | The ``bert-base-cased`` model fine-tuned on MRPC | +| | | (see `details of fine-tuning in the example section `__) | ++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| GPT | ``openai-gpt`` | 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | OpenAI GPT English model | ++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| GPT-2 | ``gpt2`` | 12-layer, 768-hidden, 12-heads, 117M parameters | +| | | OpenAI GPT-2 English model | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``gpt2-medium`` | 24-layer, 1024-hidden, 16-heads, 345M parameters | +| | | OpenAI's Medium-sized GPT-2 English model | ++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| Transformer-XL | ``transfo-xl-wt103`` | 18-layer, 1024-hidden, 16-heads, 257M parameters | +| | | English model trained on wikitext-103 | ++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| XLNet | ``xlnet-base-cased`` | 12-layer, 768-hidden, 12-heads, 110M parameters | +| | | XLNet English model | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlnet-large-cased`` | 24-layer, 1024-hidden, 16-heads, 340M parameters | +| | | XLNet Large English model | ++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| XLM | ``xlm-mlm-en-2048`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM English model | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlm-mlm-ende-1024`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM English-German Multi-language model | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlm-mlm-enfr-1024`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM English-French Multi-language model | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlm-mlm-enro-1024`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM English-Romanian Multi-language model | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlm-mlm-xnli15-1024`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM Model pre-trained with MLM on the `15 XNLI languages `__. | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlm-mlm-tlm-xnli15-1024`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM Model pre-trained with MLM + TLM on the `15 XNLI languages `__. | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlm-clm-enfr-1024`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM English model trained with CLM (Causal Language Modeling) | +| +------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ +| | ``xlm-clm-ende-1024`` | 12-layer, 1024-hidden, 8-heads | +| | | XLM English-German Multi-language model trained with CLM (Causal Language Modeling) | ++-------------------+------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------+ + +.. `__ \ No newline at end of file