HuggingFace_transformer/tests/test_tokenization_fast.py at bf64b8cf095a23303315e1347d8eac0bce9d73be

Files

Thomas Wolf 827d6d6ef0 Cleanup fast tokenizers integration (#3706 )

* First pass on utility classes and python tokenizers

* finishing cleanup pass

* style and quality

* Fix tests

* Updating following @mfuntowicz comment

* style and quality

* Fix Roberta

* fix batch_size/seq_length inBatchEncoding

* add alignement methods + tests

* Fix OpenAI and Transfo-XL tokenizers

* adding trim_offsets=True default for GPT2 et RoBERTa

* style and quality

* fix tests

* add_prefix_space in roberta

* bump up tokenizers to rc7

* style

* unfortunately tensorfow does like these - removing shape/seq_len for now

* Update src/transformers/tokenization_utils.py

Co-Authored-By: Stefan Schweter <stefan@schweter.it>

* Adding doc and docstrings

* making flake8 happy

Co-authored-by: Stefan Schweter <stefan@schweter.it>

2020-04-18 13:43:57 +02:00

29 KiB

Raw Blame History

View Raw

29 KiB Raw Blame History

29 KiB

Raw Blame History