Big Bird Fast Tokenizer implementation (#11075)

* Added Big Bird Fast Tokenizer initial file

* style fixes

* flake fixes

* Added big bird fast tokenizer to init files

* Added big bird fast to Auto tokenization

* fix styles

* minor quality fixes

* Added initial test code

* Fix SpmConverter when precompiled_charsmap doesn't exist

* fixed post processor

* minor style fix

* minor fix input names

* Actually fix identity normalization

* style

* Added token type ids to fast tokenizer

* style

* flake fix

* fix copies

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
This commit is contained in:
Tanmay Laud
2021-05-10 00:01:23 -07:00
committed by GitHub
parent 80da304a0f
commit f7f872955d
9 changed files with 325 additions and 10 deletions

View File

@@ -67,6 +67,11 @@ BigBirdTokenizer
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
create_token_type_ids_from_sequences, save_vocabulary
BigBirdTokenizerFast
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. autoclass:: transformers.BigBirdTokenizerFast
:members:
BigBird specific outputs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~