Various tokenizers fixes (#5558)

* BertTokenizerFast - Do not specify strip_accents by default

* Bump tokenizers to new version

* Add test for AddedToken serialization
This commit is contained in:
Anthony MOI
2020-07-06 18:27:53 -04:00
committed by GitHub
parent 21f28c34b7
commit 5787e4c159
4 changed files with 42 additions and 25 deletions

View File

@@ -114,7 +114,7 @@ setup(
packages=find_packages("src"),
install_requires=[
"numpy",
"tokenizers == 0.8.0-rc4",
"tokenizers == 0.8.1.rc1",
# dataclasses for Python versions that don't have it
"dataclasses;python_version<'3.7'",
# utilities from PyPA to e.g. compare versions