Add tokenizers class mismatch detection between cls and checkpoint (#12619)

* Detect mismatch by analyzing config

* Fix comment

* Fix import

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>

* Revise based on reviews

* remove kwargs

* Fix exception

* Fix handling exception again

* Disable mismatch test in PreTrainedTokenizerFast

Co-authored-by: SaulLu <55560583+SaulLu@users.noreply.github.com>
This commit is contained in:
Tomohiro Endo
2021-07-17 22:52:21 +09:00
committed by GitHub
parent b4b562d834
commit 08d609bfb8
5 changed files with 110 additions and 1 deletions

View File

@@ -44,6 +44,11 @@ class PreTrainedTokenizationFastTest(TokenizerTesterMixin, unittest.TestCase):
tokenizer = PreTrainedTokenizerFast.from_pretrained(model_paths[0])
tokenizer.save_pretrained(self.tmpdirname)
def test_tokenizer_mismatch_warning(self):
# We disable this test for PreTrainedTokenizerFast because it is the only tokenizer that is not linked to any
# model
pass
def test_pretrained_model_lists(self):
# We disable this test for PreTrainedTokenizerFast because it is the only tokenizer that is not linked to any
# model