Big Bird Fast Tokenizer implementation (#11075)
* Added Big Bird Fast Tokenizer initial file * style fixes * flake fixes * Added big bird fast tokenizer to init files * Added big bird fast to Auto tokenization * fix styles * minor quality fixes * Added initial test code * Fix SpmConverter when precompiled_charsmap doesn't exist * fixed post processor * minor style fix * minor fix input names * Actually fix identity normalization * style * Added token type ids to fast tokenizer * style * flake fix * fix copies Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
This commit is contained in:
@@ -276,7 +276,7 @@ Flax), PyTorch, and/or TensorFlow.
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| Bert Generation | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| BigBird | ✅ | ❌ | ✅ | ❌ | ❌ |
|
||||
| BigBird | ✅ | ✅ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
| BigBirdPegasus | ❌ | ❌ | ✅ | ❌ | ❌ |
|
||||
+-----------------------------+----------------+----------------+-----------------+--------------------+--------------+
|
||||
|
||||
@@ -67,6 +67,11 @@ BigBirdTokenizer
|
||||
:members: build_inputs_with_special_tokens, get_special_tokens_mask,
|
||||
create_token_type_ids_from_sequences, save_vocabulary
|
||||
|
||||
BigBirdTokenizerFast
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. autoclass:: transformers.BigBirdTokenizerFast
|
||||
:members:
|
||||
|
||||
BigBird specific outputs
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Reference in New Issue
Block a user