[GemmaConverter] use user_defined_symbols (#29473)

* use user_defined_symbols * fixup * nit * add a very robust test * make sure all models are tested with the `pretrained_tokenizer_to_test` * should we make sure we test all of them? * merge * remove the id * fix test * update * ousies * oups * fixup * fix copies check * remove `pretrained_tokenizer_to_test`
2024-03-20 03:13:56 +13:00
parent 8e2fc52ea3
commit 2f9a3edbb9
3 changed files with 63 additions and 3 deletions
--- a/tests/models/llama/test_tokenization_llama.py
+++ b/tests/models/llama/test_tokenization_llama.py
@@ -52,7 +52,7 @@ if is_torch_available():
@require_sentencepiece
@require_tokenizers
 class LlamaTokenizationTest(TokenizerTesterMixin, unittest.TestCase):
-    from_pretrained_id = "hf-internal-testing/llama-tokenizer"
+    from_pretrained_id = ["hf-internal-testing/llama-tokenizer", "meta-llama/Llama-2-7b-hf"]
    tokenizer_class = LlamaTokenizer
    rust_tokenizer_class = LlamaTokenizerFast