Change how "additional_special_tokens" argument in the ".from_pretrained" method of the tokenizer is taken into account (#13056)
* add test * add change in PretrainedTokenizerBase * change Luke * deactivate * add the possibility to add additional special tokens for M2M100 * format * add special test for canine * proposed changes for mbart * proposed changes for mbart50 * proposed changes for byt5 * proposed changes for canine * proposed changes for t5 * test fast and slow * remove comment * remove comment * add fast version for all tests * replace break by continue * add more comments * add check to avoid duplicates * remove comment * format * proposed change for wave2vec2 * reverse changes mbart * uncomment * format
This commit is contained in:
@@ -1862,6 +1862,12 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
|
||||
with open(special_tokens_map_file, encoding="utf-8") as special_tokens_map_handle:
|
||||
special_tokens_map = json.load(special_tokens_map_handle)
|
||||
for key, value in special_tokens_map.items():
|
||||
if key in kwargs and kwargs[key]:
|
||||
# This value has already been redefined by the kwargs
|
||||
# We keep this new value and ignore the one stored in the special_tokens_map_file
|
||||
|
||||
continue
|
||||
|
||||
if isinstance(value, dict):
|
||||
value = AddedToken(**value)
|
||||
elif isinstance(value, list):
|
||||
|
||||
Reference in New Issue
Block a user