Anthony MOI
36434220fc
[HUGE] Refactoring tokenizers backend - padding - truncation - pre-tokenized pipeline - fast tokenizers - tests (#4510)
* Use tokenizers pre-tokenized pipeline
* failing pretrokenized test
* Fix is_pretokenized in python
* add pretokenized tests
* style and quality
* better tests for batched pretokenized inputs
* tokenizers clean up - new padding_strategy - split the files
* [HUGE] refactoring tokenizers - padding - truncation - tests
* style and quality
* bump up requied tokenizers version to 0.8.0-rc1
* switched padding/truncation API - simpler better backward compat
* updating tests for custom tokenizers
* style and quality - tests on pad
* fix QA pipeline
* fix backward compatibility for max_length only
* style and quality
* Various cleans up - add verbose
* fix tests
* update docstrings
* Fix tests
* Docs reformatted
* __call__ method documented
Co-authored-by: Thomas Wolf <thomwolf@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2020-06-15 17:12:51 -04:00
..
2020-02-25 13:48:24 -05:00
2020-06-05 12:22:50 -04:00
2020-06-15 17:12:51 -04:00
2020-06-12 15:47:57 -04:00
2020-01-06 15:11:12 +01:00
2020-05-07 18:44:18 -04:00
2020-06-02 09:49:09 -04:00
2020-05-11 13:10:00 -04:00
2020-05-27 11:36:55 -04:00
2020-02-25 13:48:24 -05:00
2020-05-07 10:17:01 +02:00
2020-06-05 12:22:50 -04:00
2020-03-17 10:17:11 -04:00
2020-05-11 13:35:13 -04:00
2020-03-10 16:52:44 -04:00
2020-05-07 18:44:18 -04:00
2020-04-06 14:32:39 -04:00
2020-06-15 13:29:26 -04:00
2020-05-10 09:02:36 -04:00
2020-04-10 12:34:04 -04:00
2020-06-05 18:45:42 -04:00
2020-01-06 15:11:12 +01:00
2020-05-13 14:24:08 -04:00