[GemmaConverter] use user_defined_symbols (#29473)

* use user_defined_symbols

* fixup

* nit

* add a very robust test

* make sure all models are tested with the `pretrained_tokenizer_to_test`

* should we make sure we test all of them?

* merge

* remove the id

* fix test

* update

* ousies

* oups

* fixup

* fix copies check

* remove `pretrained_tokenizer_to_test`
This commit is contained in:
Arthur
2024-03-20 03:13:56 +13:00
committed by GitHub
parent 8e2fc52ea3
commit 2f9a3edbb9
3 changed files with 63 additions and 3 deletions

View File

@@ -52,7 +52,7 @@ if is_torch_available():
@require_sentencepiece
@require_tokenizers
class LlamaTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
from_pretrained_id = "hf-internal-testing/llama-tokenizer"
from_pretrained_id = ["hf-internal-testing/llama-tokenizer", "meta-llama/Llama-2-7b-hf"]
tokenizer_class = LlamaTokenizer
rust_tokenizer_class = LlamaTokenizerFast