Cleanup fast tokenizers integration (#3706)

* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy

Co-authored-by: Stefan Schweter <stefan@schweter.it>
This commit is contained in:
Thomas Wolf
2020-04-18 13:43:57 +02:00
committed by GitHub
parent 60a42ef1c0
commit 827d6d6ef0
28 changed files with 1031 additions and 503 deletions

View File

@@ -52,6 +52,13 @@ BertTokenizer
create_token_type_ids_from_sequences, save_vocabulary
BertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BertTokenizerFast
:members:
BertModel
~~~~~~~~~~~~~~~~~~~~

View File

@@ -44,6 +44,13 @@ DistilBertTokenizer
:members:
DistilBertTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.DistilBertTokenizerFast
:members:
DistilBertModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -61,6 +61,13 @@ ElectraTokenizer
:members:
ElectraTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.ElectraTokenizerFast
:members:
ElectraModel
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -53,6 +53,13 @@ OpenAIGPTTokenizer
:members: save_vocabulary
OpenAIGPTTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.OpenAIGPTTokenizerFast
:members:
OpenAIGPTModel
~~~~~~~~~~~~~~~~~~~~~~~~~

View File

@@ -51,6 +51,13 @@ GPT2Tokenizer
:members: save_vocabulary
GPT2TokenizerFast
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.GPT2TokenizerFast
:members:
GPT2Model
~~~~~~~~~~~~~~~~~~~~~

View File

@@ -46,6 +46,13 @@ RobertaTokenizer
create_token_type_ids_from_sequences, save_vocabulary
RobertaTokenizerFast
~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.RobertaTokenizerFast
:members: build_inputs_with_special_tokens
RobertaModel
~~~~~~~~~~~~~~~~~~~~

View File

@@ -47,6 +47,13 @@ TransfoXLTokenizer
:members: save_vocabulary
TransfoXLTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.TransfoXLTokenizerFast
:members:
TransfoXLModel
~~~~~~~~~~~~~~~~~~~~~~~~~~