Authorize last version of tokenizer (#9799)

* Authorize last version of tokenizer

* Update version table

* Fix conversion of spm tokenizers and fix some hub links

* Bump tokenizers version to 0.10.1rc1

* Add script to check tokenizers conversion with XNLI

* Add some more mask_token lstrip support

* Must modify mask_token in slow tokenizers too

* Keep using the old method for Pegasus

* add missing import

Co-authored-by: Anthony MOI <m.anthony.moi@gmail.com>
This commit is contained in:
Sylvain Gugger
2021-02-04 14:18:33 -05:00
committed by GitHub
parent d5888ef0ab
commit 21b3922e35
18 changed files with 245 additions and 23 deletions

View File

@@ -45,7 +45,7 @@ deps = {
"tensorflow-cpu": "tensorflow-cpu>=2.3",
"tensorflow": "tensorflow>=2.3",
"timeout-decorator": "timeout-decorator",
"tokenizers": "tokenizers==0.9.4",
"tokenizers": "tokenizers==0.10.1rc1",
"torch": "torch>=1.0",
"tqdm": "tqdm>=4.27",
"unidic": "unidic>=1.0.2",