[AutoTokenizer] Allow creation of tokenizers by tokenizer type (#13668)

* up

* up
This commit is contained in:
Patrick von Platen
2021-09-22 00:29:38 +02:00
committed by GitHub
parent 2608944dc2
commit 8e908c8c74
5 changed files with 81 additions and 1 deletions

5
tests/fixtures/merges.txt vendored Normal file
View File

@@ -0,0 +1,5 @@
#version: 0.2
Ġ l
Ġl o
Ġlo w
e r

1
tests/fixtures/vocab.json vendored Normal file
View File

@@ -0,0 +1 @@
{"l": 0, "o": 1, "w": 2, "e": 3, "r": 4, "s": 5, "t": 6, "i": 7, "d": 8, "n": 9, "Ġ": 10, "Ġl": 11, "Ġn": 12, "Ġlo": 13, "Ġlow": 14, "er": 15, "Ġlowest": 16, "Ġnewer": 17, "Ġwider": 18, "<unk>": 19, "<|endoftext|>": 20}

10
tests/fixtures/vocab.txt vendored Normal file
View File

@@ -0,0 +1,10 @@
[PAD]
[SEP]
[MASK]
[CLS]
[unused3]
[unused4]
[unused5]
[unused6]
[unused7]
[unused8]