Cleanup fast tokenizers integration (#3706)
* First pass on utility classes and python tokenizers * finishing cleanup pass * style and quality * Fix tests * Updating following @mfuntowicz comment * style and quality * Fix Roberta * fix batch_size/seq_length inBatchEncoding * add alignement methods + tests * Fix OpenAI and Transfo-XL tokenizers * adding trim_offsets=True default for GPT2 et RoBERTa * style and quality * fix tests * add_prefix_space in roberta * bump up tokenizers to rc7 * style * unfortunately tensorfow does like these - removing shape/seq_len for now * Update src/transformers/tokenization_utils.py Co-Authored-By: Stefan Schweter <stefan@schweter.it> * Adding doc and docstrings * making flake8 happy Co-authored-by: Stefan Schweter <stefan@schweter.it>
This commit is contained in:
@@ -52,6 +52,13 @@ BertTokenizer
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
BertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BertTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
BertModel
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -44,6 +44,13 @@ DistilBertTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
DistilBertTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.DistilBertTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
DistilBertModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -61,6 +61,13 @@ ElectraTokenizer
|
||||
:members:
|
||||
|
||||
|
||||
ElectraTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.ElectraTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
ElectraModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -53,6 +53,13 @@ OpenAIGPTTokenizer
|
||||
:members: save_vocabulary
|
||||
|
||||
|
||||
OpenAIGPTTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.OpenAIGPTTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
OpenAIGPTModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -51,6 +51,13 @@ GPT2Tokenizer
|
||||
:members: save_vocabulary
|
||||
|
||||
|
||||
GPT2TokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.GPT2TokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
GPT2Model
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -46,6 +46,13 @@ RobertaTokenizer
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
|
||||
RobertaTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.RobertaTokenizerFast
|
||||
:members: build_inputs_with_special_tokens
|
||||
|
||||
|
||||
RobertaModel
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
@@ -47,6 +47,13 @@ TransfoXLTokenizer
|
||||
:members: save_vocabulary
|
||||
|
||||
|
||||
TransfoXLTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.TransfoXLTokenizerFast
|
||||
:members:
|
||||
|
||||
|
||||
TransfoXLModel
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
|
||||
Reference in New Issue
Block a user