🚨 🚨 Fix custom code saving (#37716)

* Firstly: Better detection of when we're a custom class

* Trigger tests

* Let's break everything

* make fixup

* fix mistaken line doubling

* Let's try to get rid of it from config classes at least

* Let's try to get rid of it from config classes at least

* Fixup image processor

* no more circular import

* Let's go back to setting `_auto_class` again

* Let's go back to setting `_auto_class` again

* stash commit

* Revert the irrelevant changes until we figure out AutoConfig

* Change tests since we're breaking expectations

* make fixup

* do the same for all custom classes

* Cleanup for feature extractor tests

* Cleanup tokenization tests too

* typo

* Fix tokenizer tests

* make fixup

* fix image processor test

* make fixup

* Remove warning from register_for_auto_class

* Stop adding model info to auto map entirely

* Remove todo

* Remove the other todo

* Let's start slapping _auto_class on models why not

* Let's start slapping _auto_class on models why not

* Make sure the tests know what's up

* Make sure the tests know what's up

* Completely remove add_model_info_to_*

* Start adding _auto_class to models

* Start adding _auto_class to models

* Add a flaky decorator

* Add a flaky decorator and import

* stash commit

* More message cleanup

* make fixup

* fix indent

* Fix trust_remote_code prompts

* make fixup

* correct indentation

* Reincorporate changes into dynamic_module_utils

* Update call to trust_remote_code

* make fixup

* Fix video processors too

* Fix video processors too

* Remove is_flaky additions

* make fixup
This commit is contained in:
Matt
2025-05-26 17:37:30 +01:00
committed by GitHub
parent 701caef704
commit ba6d72226d
25 changed files with 120 additions and 231 deletions

View File

@@ -342,17 +342,20 @@ class AutoTokenizerTest(unittest.TestCase):
with tempfile.TemporaryDirectory() as tmp_dir:
tokenizer.save_pretrained(tmp_dir)
reloaded_tokenizer = AutoTokenizer.from_pretrained(tmp_dir, trust_remote_code=True, use_fast=False)
self.assertTrue(
os.path.exists(os.path.join(tmp_dir, "tokenization.py"))
) # Assert we saved tokenizer code
self.assertEqual(reloaded_tokenizer._auto_class, "AutoTokenizer")
with open(os.path.join(tmp_dir, "tokenizer_config.json"), "r") as f:
tokenizer_config = json.load(f)
# Assert we're pointing at local code and not another remote repo
self.assertEqual(tokenizer_config["auto_map"]["AutoTokenizer"], ["tokenization.NewTokenizer", None])
self.assertEqual(reloaded_tokenizer.__class__.__name__, "NewTokenizer")
self.assertTrue(reloaded_tokenizer.special_attribute_present)
else:
self.assertEqual(tokenizer.__class__.__name__, "NewTokenizer")
self.assertEqual(reloaded_tokenizer.__class__.__name__, "NewTokenizer")
# The tokenizer file is cached in the snapshot directory. So the module file is not changed after dumping
# to a temp dir. Because the revision of the module file is not changed.
# Test the dynamic module is loaded only once if the module file is not changed.
self.assertIs(tokenizer.__class__, reloaded_tokenizer.__class__)
# Test the dynamic module is reloaded if we force it.
reloaded_tokenizer = AutoTokenizer.from_pretrained(
"hf-internal-testing/test_dynamic_tokenizer", trust_remote_code=True, force_download=True