Deprecate default chat templates (#30346)

* initial commit, remove warnings on default chat templates

* stash commit

* Raise a much sterner warning for default chat templates, and prepare for depreciation

* Update the docs
This commit is contained in:
Matt
2024-04-19 15:41:26 +01:00
committed by GitHub
parent e67ccf0610
commit 0927bfd002
20 changed files with 102 additions and 79 deletions

View File

@@ -362,7 +362,11 @@ template for your tokenizer is by checking the `tokenizer.default_chat_template`
This is something we do purely for backward compatibility reasons, to avoid breaking any existing workflows. Even when
the class template is appropriate for your model, we strongly recommend overriding the default template by
setting the `chat_template` attribute explicitly to make it clear to users that your model has been correctly configured
for chat, and to future-proof in case the default templates are ever altered or deprecated.
for chat.
Now that actual chat templates have been adopted more widely, default templates have been deprecated and will be
removed in a future release. We strongly recommend setting the `chat_template` attribute for any tokenizers that
still depend on them!
### What template should I use?
@@ -374,8 +378,8 @@ best performance for inference or fine-tuning when you precisely match the token
If you're training a model from scratch, or fine-tuning a base language model for chat, on the other hand,
you have a lot of freedom to choose an appropriate template! LLMs are smart enough to learn to handle lots of different
input formats. Our default template for models that don't have a class-specific template follows the
`ChatML` format, and this is a good, flexible choice for many use-cases. It looks like this:
input formats. One popular choice is the `ChatML` format, and this is a good, flexible choice for many use-cases.
It looks like this:
```
{% for message in messages %}

View File

@@ -412,10 +412,11 @@ class BlenderbotTokenizer(PreTrainedTokenizer):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"

View File

@@ -294,10 +294,11 @@ class BlenderbotTokenizerFast(PreTrainedTokenizerFast):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"

View File

@@ -225,10 +225,11 @@ class BlenderbotSmallTokenizer(PreTrainedTokenizer):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"

View File

@@ -105,10 +105,11 @@ class BlenderbotSmallTokenizerFast(PreTrainedTokenizerFast):
A very simple chat template that just adds whitespace between messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"

View File

@@ -156,9 +156,10 @@ class BloomTokenizerFast(PreTrainedTokenizerFast):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"

View File

@@ -457,10 +457,11 @@ class CodeLlamaTokenizer(PreTrainedTokenizer):
in the original repository.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
template = (
"{% if messages[0]['role'] == 'system' %}"

View File

@@ -370,10 +370,11 @@ class CodeLlamaTokenizerFast(PreTrainedTokenizerFast):
in the original repository.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
template = (
"{% if messages[0]['role'] == 'system' %}"

View File

@@ -248,10 +248,11 @@ class CohereTokenizerFast(PreTrainedTokenizerFast):
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
default_template = (
"{{ bos_token }}"

View File

@@ -337,9 +337,10 @@ class GPT2Tokenizer(PreTrainedTokenizer):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"

View File

@@ -148,9 +148,10 @@ class GPT2TokenizerFast(PreTrainedTokenizerFast):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"

View File

@@ -235,9 +235,10 @@ class GPTNeoXTokenizerFast(PreTrainedTokenizerFast):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"

View File

@@ -166,10 +166,11 @@ class GPTNeoXJapaneseTokenizer(PreTrainedTokenizer):
A simple chat template that just adds BOS/EOS tokens around messages while discarding role information.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"

View File

@@ -302,10 +302,11 @@ class GPTSw3Tokenizer(PreTrainedTokenizer):
preceding messages. BOS tokens are added between all messages.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{{ eos_token }}{{ bos_token }}"

View File

@@ -247,10 +247,11 @@ class GPTSanJapaneseTokenizer(PreTrainedTokenizer):
information.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"

View File

@@ -430,10 +430,11 @@ class LlamaTokenizer(PreTrainedTokenizer):
in the original repository.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
template = (
"{% if messages[0]['role'] == 'system' %}"

View File

@@ -227,10 +227,11 @@ class LlamaTokenizerFast(PreTrainedTokenizerFast):
in the original repository.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
template = (
"{% if messages[0]['role'] == 'system' %}"

View File

@@ -816,10 +816,11 @@ class WhisperTokenizer(PreTrainedTokenizer):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"

View File

@@ -545,10 +545,11 @@ class WhisperTokenizerFast(PreTrainedTokenizerFast):
A simple chat template that ignores role information and just concatenates messages with EOS tokens.
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using the default template "
f"for the {self.__class__.__name__} class. If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a default class-level template. "
"This is very error-prone, because models are often trained with templates different from the class "
"default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return "{% for message in messages %}" "{{ message.content }}{{ eos_token }}" "{% endfor %}"

View File

@@ -1841,10 +1841,11 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
https://github.com/openai/openai-python/blob/main/chatml.md
"""
logger.warning_once(
"\nNo chat template is defined for this tokenizer - using a default chat template "
"that implements the ChatML format (without BOS/EOS tokens!). If the default is not appropriate for "
"your model, please set `tokenizer.chat_template` to an appropriate template. "
"See https://huggingface.co/docs/transformers/main/chat_templating for more information.\n"
"No chat template is set for this tokenizer, falling back to a ChatML template. "
"This is very error-prone, because most models are not trained with a ChatML template!"
"Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which "
"point any code depending on them will stop working. We recommend setting a valid chat template before "
"then to ensure that this model continues working without issues."
)
return (
"{% for message in messages %}"