Deprecate default chat templates (#30346)
* initial commit, remove warnings on default chat templates * stash commit * Raise a much sterner warning for default chat templates, and prepare for depreciation * Update the docs
This commit is contained in:
@@ -362,7 +362,11 @@ template for your tokenizer is by checking the `tokenizer.default_chat_template`
|
||||
This is something we do purely for backward compatibility reasons, to avoid breaking any existing workflows. Even when
|
||||
the class template is appropriate for your model, we strongly recommend overriding the default template by
|
||||
setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured
|
||||
for chat, and to future-proof in case the default templates are ever altered or deprecated.
|
||||
for chat.
|
||||
|
||||
Now that actual chat templates have been adopted more widely, default templates have been deprecated and will be
|
||||
removed in a future release. We strongly recommend setting the `chat_template` attribute for any tokenizers that
|
||||
still depend on them!
|
||||
|
||||
### What template should I use?
|
||||
|
||||
@@ -374,8 +378,8 @@ best performance for inference or fine-tuning when you precisely match the token
|
||||
|
||||
If you're training a model from scratch, or fine-tuning a base language model for chat, on the other hand,
|
||||
you have a lot of freedom to choose an appropriate template! LLMs are smart enough to learn to handle lots of different
|
||||
input formats. Our default template for models that don't have a class-specific template follows the
|
||||
`ChatML` format, and this is a good, flexible choice for many use-cases. It looks like this:
|
||||
input formats. One popular choice is the `ChatML` format, and this is a good, flexible choice for many use-cases.
|
||||
It looks like this:
|
||||
|
||||
```
|
||||
{% for message in messages %}
|
||||
|
||||
@@ -412,10 +412,11 @@ class BlenderbotTokenizer(PreTrainedTokenizer):
|
||||
A very simple chat template that just adds whitespace between messages.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{% for message in messages %}"
|
||||
|
||||
@@ -294,10 +294,11 @@ class BlenderbotTokenizerFast(PreTrainedTokenizerFast):
|
||||
A very simple chat template that just adds whitespace between messages.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{% for message in messages %}"
|
||||
|
||||
@@ -225,10 +225,11 @@ class BlenderbotSmallTokenizer(PreTrainedTokenizer):
|
||||
A very simple chat template that just adds whitespace between messages.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{% for message in messages %}"
|
||||
|
||||
@@ -105,10 +105,11 @@ class BlenderbotSmallTokenizerFast(PreTrainedTokenizerFast):
|
||||
A very simple chat template that just adds whitespace between messages.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{% for message in messages %}"
|
||||
|
||||
@@ -156,9 +156,10 @@ class BloomTokenizerFast(PreTrainedTokenizerFast):
|
||||
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
|
||||
|
||||
@@ -457,10 +457,11 @@ class CodeLlamaTokenizer(PreTrainedTokenizer):
|
||||
in the original repository.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
template = (
|
||||
"{% if messages[0]['role'] == 'system' %}"
|
||||
|
||||
@@ -370,10 +370,11 @@ class CodeLlamaTokenizerFast(PreTrainedTokenizerFast):
|
||||
in the original repository.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
template = (
|
||||
"{% if messages[0]['role'] == 'system' %}"
|
||||
|
||||
@@ -248,10 +248,11 @@ class CohereTokenizerFast(PreTrainedTokenizerFast):
|
||||
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
default_template = (
|
||||
"{{ bos_token }}"
|
||||
|
||||
@@ -337,9 +337,10 @@ class GPT2Tokenizer(PreTrainedTokenizer):
|
||||
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
|
||||
|
||||
@@ -148,9 +148,10 @@ class GPT2TokenizerFast(PreTrainedTokenizerFast):
|
||||
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
|
||||
|
||||
@@ -235,9 +235,10 @@ class GPTNeoXTokenizerFast(PreTrainedTokenizerFast):
|
||||
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
|
||||
|
||||
@@ -166,10 +166,11 @@ class GPTNeoXJapaneseTokenizer(PreTrainedTokenizer):
|
||||
A simple chat template that just adds BOS/EOS tokens around messages while discarding role information.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{% for message in messages %}"
|
||||
|
||||
@@ -302,10 +302,11 @@ class GPTSw3Tokenizer(PreTrainedTokenizer):
|
||||
preceding messages. BOS tokens are added between all messages.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{{ eos_token }}{{ bos_token }}"
|
||||
|
||||
@@ -247,10 +247,11 @@ class GPTSanJapaneseTokenizer(PreTrainedTokenizer):
|
||||
information.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{% for message in messages %}"
|
||||
|
||||
@@ -430,10 +430,11 @@ class LlamaTokenizer(PreTrainedTokenizer):
|
||||
in the original repository.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
template = (
|
||||
"{% if messages[0]['role'] == 'system' %}"
|
||||
|
||||
@@ -227,10 +227,11 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
|
||||
in the original repository.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
template = (
|
||||
"{% if messages[0]['role'] == 'system' %}"
|
||||
|
||||
@@ -816,10 +816,11 @@ class WhisperTokenizer(PreTrainedTokenizer):
|
||||
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
|
||||
|
||||
|
||||
@@ -545,10 +545,11 @@ class WhisperTokenizerFast(PreTrainedTokenizerFast):
|
||||
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using the default template "
|
||||
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a default class-level template. "
|
||||
"This is very error-prone, because models are often trained with templates different from the class "
|
||||
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"
|
||||
|
||||
|
||||
@@ -1841,10 +1841,11 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
|
||||
https://github.com/openai/openai-python/blob/main/chatml.md
|
||||
"""
|
||||
logger.warning_once(
|
||||
"\nNo chat template is defined for this tokenizer - using a default chat template "
|
||||
"that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for "
|
||||
"your model, please set `tokenizer.chat_template` to an appropriate template. "
|
||||
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
|
||||
"No chat template is set for this tokenizer, falling back to a ChatML template. "
|
||||
"This is very error-prone, because most models are not trained with a ChatML template!"
|
||||
"Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
|
||||
"point any code depending on them will stop working. We recommend setting a valid chat template before "
|
||||
"then to ensure that this model continues working without issues."
|
||||
)
|
||||
return (
|
||||
"{% for message in messages %}"
|
||||
|
||||
Reference in New Issue
Block a user