Documentation (#2989)

* All Tokenizers

BertTokenizer + few fixes
RobertaTokenizer
OpenAIGPTTokenizer + Fixes
GPT2Tokenizer + fixes
TransfoXLTokenizer
Correct rst for TransformerXL
XLMTokenizer + fixes
XLNet Tokenizer + Style
DistilBERT + Fix XLNet RST
CTRLTokenizer
CamemBERT Tokenizer
FlaubertTokenizer
XLMRobertaTokenizer
cleanup

* cleanup
This commit is contained in:
Lysandre Debut
2020-02-25 18:43:36 -05:00
committed by GitHub
parent c913eb9c38
commit bb7c468520
30 changed files with 866 additions and 242 deletions

View File

@@ -5,7 +5,7 @@ Overview
~~~~~~~~~~~~~~~~~~~~~
OpenAI GPT-2 model was proposed in
`Language Models are Unsupervised Multitask Learners`_
`Language Models are Unsupervised Multitask Learners <https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf>`_
by Alec Radford*, Jeffrey Wu*, Rewon Child, David Luan, Dario Amodei** and Ilya Sutskever**.
It's a causal (unidirectional) transformer pre-trained using language modeling on a very large
corpus of ~40 GB of text data.
@@ -46,7 +46,7 @@ GPT2Tokenizer
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2Tokenizer
:members:
:members: save_vocabulary
GPT2Model