Chat Template support for function calling and RAG (#30621)

* First draft, still missing automatic function conversion * First draft of the automatic schema generator * Lots of small fixes * the walrus has betrayed me * please stop committing your debug breakpoints * Lots of cleanup and edge cases, looking better now * Comments and bugfixes for the type hint parser * More cleanup * Add tests, update schema generator * Update tests, proper handling of return values * Small docstring change * More doc updates * More doc updates * Add json_schema decorator * Clean up the TODOs and finish the docs * self.maxDiff = None to see the whole diff for the nested list test * add import for add_json_schema * Quick test fix * Fix something that was bugging me in the chat template docstring * Less "anyOf" when unnecessary * Support return types for the templates that need them * Proper return type tests * Switch to Google format docstrings * Update chat templating docs to match new format * Stop putting the return type in with the other parameters * Add Tuple support * No more decorator - we just do it implicitly! * Add enum support to get_json_schema * Update docstring * Add copyright header * Update src/transformers/tokenization_utils_base.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/chat_templating.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/chat_template_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/utils/chat_template_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Add copyright header * make fixup * Fix indentation * Reformat chat_template_utils * Correct return value * Make regexes module-level * Support more complex, multi-line arg docstrings * Update error message for ... * Update ruff * Add document type validation * Refactor docs * Refactor docs * Refactor docs * Clean up Tuple error * Add an extra test for very complex defs and docstrings and clean everything up for it * Document enum block * Quick test fixes * Stop supporting type hints in docstring to fix bugs and simplify the regex * Update docs for the regex change * Clean up enum regex * Wrap functions in {"type": "function", "function": ...} * Update src/transformers/utils/chat_template_utils.py Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> * Temporary tool calling commit * Add type hints to chat template utils, partially update docs (incomplete!) * Code cleanup based on @molbap's suggestion * Add comments to explain regexes * Fix up type parsing for unions and lists * Add custom exception types and adjust tests to look for them * Update docs with a demo! * Docs cleanup * Pass content as string * Update tool call formatting * Update docs with new function format * Update docs * Update docs with a second tool to show the model choosing correctly --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
2024-06-11 15:46:38 +01:00
parent ce3647ad2d
commit edc1dffd00
5 changed files with 1164 additions and 3 deletions
--- a/docs/source/en/chat_templating.md
+++ b/docs/source/en/chat_templating.md
@@ -233,6 +233,332 @@ The sun.</s>

 From here, just continue training like you would with a standard language modelling task, using the `formatted_chat` column.

+## Advanced: Extra inputs to chat templates
+
+The only argument that `apply_chat_template` requires is `messages`. However, you can pass any keyword
+argument to `apply_chat_template` and it will be accessible inside the template. This gives you a lot of freedom to use
+chat templates for many things. There are no restrictions on the names or the format of these arguments - you can pass
+strings, lists, dicts or whatever else you want. 
+
+That said, there are some common use-cases for these extra arguments,
+such as passing tools for function calling, or documents for retrieval-augmented generation. In these common cases,
+we have some opinionated recommendations about what the names and formats of these arguments should be, which are
+described in the sections below. We encourage model authors to make their chat templates compatible with this format,
+to make it easy to transfer tool-calling code between models.
+
+## Advanced: Tool use / function calling
+
+"Tool use" LLMs can choose to call functions as external tools before generating an answer. When passing tools
+to a tool-use model, you can simply pass a list of functions to the `tools` argument:
+
+```python
+import datetime
+
+def current_time():
+    """Get the current local time as a string."""
+    return str(datetime.now())
+
+def multiply(a: float, b: float):
+    """
+    A function that multiplies two numbers
+    
+    Args:
+        a: The first number to multiply
+        b: The second number to multiply
+    """
+    return a * b
+
+tools = [current_time, multiply]
+
+model_input = tokenizer.apply_chat_template(
+    messages,
+    tools=tools
+)
+```
+
+In order for this to work correctly, you should write your functions in the format above, so that they can be parsed
+correctly as tools. Specifically, you should follow these rules:
+
+- The function should have a descriptive name
+- Every argument must have a type hint
+- The function must have a docstring in the standard Google style (in other words, an initial function description  
+  followed by an `Args:` block that describes the arguments, unless the function does not have any arguments. 
+- Do not include types in the `Args:` block. In other words, write `a: The first number to multiply`, not
+  `a (int): The first number to multiply`. Type hints should go in the function header instead.
+- The function can have a return type and a `Returns:` block in the docstring. However, these are optional
+  because most tool-use models ignore them.
+
+### Passing tool results to the model
+
+The sample code above is enough to list the available tools for your model, but what happens if it wants to actually use
+one? If that happens, you should:
+
+1. Parse the model's output to get the tool name(s) and arguments.
+2. Add the model's tool call(s) to the conversation.
+3. Call the corresponding function(s) with those arguments.
+4. Add the result(s) to the conversation
+
+### A complete tool use example
+
+Let's walk through a tool use example, step by step. For this example, we will use an 8B `Hermes-2-Pro` model,
+as it is one of the highest-performing tool-use models in its size category at the time of writing. If you have the
+memory, you can consider using a larger model instead like [Command-R](https://huggingface.co/CohereForAI/c4ai-command-r-v01)
+or [Mixtral-8x22B](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1), both of which also support tool use
+and offer even stronger performance.
+
+First, let's load our model and tokenizer:
+
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+
+checkpoint = "NousResearch/Hermes-2-Pro-Llama-3-8B"
+
+tokenizer = AutoTokenizer.from_pretrained(checkpoint, revision="pr/13")
+model = AutoModelForCausalLM.from_pretrained(checkpoint, torch_dtype=torch.bfloat16, device_map="auto")
+```
+
+Next, let's define a list of tools:
+
+```python
+def get_current_temperature(location: str, unit: str) -> float:
+    """
+    Get the current temperature at a location.
+    
+    Args:
+        location: The location to get the temperature for, in the format "City, Country"
+        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
+    Returns:
+        The current temperature at the specified location in the specified units, as a float.
+    """
+    return 22.  # A real function should probably actually get the temperature!
+
+def get_current_wind_speed(location: str) -> float:
+    """
+    Get the current wind speed in km/h at a given location.
+    
+    Args:
+        location: The location to get the temperature for, in the format "City, Country"
+    Returns:
+        The current wind speed at the given location in km/h, as a float.
+    """
+    return 6.  # A real function should probably actually get the wind speed!
+
+tools = [get_current_temperature, get_current_wind_speed]
+```
+
+Now, let's set up a conversation for our bot:
+
+```python
+messages = [
+  {"role": "system", "content": "You are a bot that responds to weather queries. You should reply with the unit used in the queried location."},
+  {"role": "user", "content": "Hey, what's the temperature in Paris right now?"}
+]
+```
+
+Now, let's apply the chat template and generate a response:
+
+```python
+inputs = tokenizer.apply_chat_template(messages, chat_template="tool_use", tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
+out = model.generate(**inputs, max_new_tokens=128)
+print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
+```
+
+And we get:
+
+```text
+<tool_call>
+{"arguments": {"location": "Paris, France", "unit": "celsius"}, "name": "get_current_temperature"}
+</tool_call><|im_end|>
+```
+
+The model has called the function with valid arguments, in the format requested by the function docstring. It has
+inferred that we're most likely referring to the Paris in France, and it remembered that, as the home of SI units,
+the temperature in France should certainly be displayed in Celsius.
+
+Let's append the model's tool call to the conversation. Note that we generate a random `tool_call_id` here. These IDs
+are not used by all models, but they allow models to issue multiple tool calls at once and keep track of which response
+corresponds to which call. You can generate them any way you like, but they should be unique within each chat.
+
+```python
+tool_call_id = "vAHdf3"  # Random ID, should be unique for each tool call
+tool_call = {"name": "get_current_temperature", "arguments": {"location": "Paris, France", "unit": "celsius"}}
+messages.append({"role": "assistant", "tool_calls": [{"id": tool_call_id, "type": "function", "function": tool_call}]})
+```
+
+
+Now that we've added the tool call to the conversation, we can call the function and append the result to the
+conversation. Since we're just using a dummy function for this example that always returns 22.0, we can just append 
+that result directly. Again, note the `tool_call_id` - this should match the ID used in the tool call above.
+
+```python
+messages.append({"role": "tool", "tool_call_id": tool_call_id, "name": "get_current_temperature", "content": "22.0"})
+```
+
+Finally, let's let the assistant read the function outputs and continue chatting with the user:
+
+```python
+inputs = tokenizer.apply_chat_template(messages, chat_template="tool_use", tools=tools, add_generation_prompt=True, return_dict=True, return_tensors="pt")
+inputs = {k: v.to(model.device) for k, v in inputs.items()}
+out = model.generate(**inputs, max_new_tokens=128)
+print(tokenizer.decode(out[0][len(inputs["input_ids"][0]):]))
+```
+
+And we get:
+
+```text
+The current temperature in Paris, France is 22.0 ° Celsius.<|im_end|>
+```
+
+Although this was a simple demo with dummy tools and a single call, the same technique works with 
+multiple real tools and longer conversations. This can be a powerful way to extend the capabilities of conversational
+agents with real-time information, computational tools like calculators, or access to large databases.
+
+<Tip>
+Not all of the tool-calling features shown above are used by all models. Some use tool call IDs, others simply use the function name and
+match tool calls to results using the ordering, and there are several models that use neither and only issue one tool 
+call at a time to avoid confusion. If you want your code to be compatible across as many models as possible, we 
+recommend structuring your tools calls like we've shown here, and returning tool results in the order that
+they were issued by the model. The chat templates on each model should handle the rest.
+</Tip>
+
+### Understanding tool schemas
+
+Each function you pass to the `tools` argument of `apply_chat_template` is converted into a 
+[JSON schema](https://json-schema.org/learn/getting-started-step-by-step). These schemas
+are then passed to the model chat template. In other words, tool-use models do not see your functions directly, and they
+never see the actual code inside them. What they care about is the function **definitions** and the **arguments** they
+need to pass to them - they care about what the tools do and how to use them, not how they work! It is up to you
+to read their outputs, detect if they have requested to use a tool, pass their arguments to the tool function, and
+return the response in the chat.
+
+Generating JSON schemas to pass to the template should be automatic and invisible as long as your functions
+follow the specification above, but if you encounter problems, or you simply want more control over the conversion, 
+you can handle the conversion manually. Here is an example of a manual schema conversion.
+
+```python
+from transformers.utils import get_json_schema
+
+def multiply(a: float, b: float):
+    """
+    A function that multiplies two numbers
+    
+    Args:
+        a: The first number to multiply
+        b: The second number to multiply
+    """
+    return a * b
+
+schema = get_json_schema(multiply)
+print(schema)
+```
+
+This will yield:
+
+```json
+{
+  "type": "function", 
+  "function": {
+    "name": "multiply", 
+    "description": "A function that multiplies two numbers", 
+    "parameters": {
+      "type": "object", 
+      "properties": {
+        "a": {
+          "type": "number", 
+          "description": "The first number to multiply"
+        }, 
+        "b": {
+          "type": "number",
+          "description": "The second number to multiply"
+        }
+      }, 
+      "required": ["a", "b"]
+    }
+  }
+}
+```
+
+If you wish, you can edit these schemas, or even write them from scratch yourself without using `get_json_schema` at 
+all. JSON schemas can be passed directly to the `tools` argument of 
+`apply_chat_template` - this gives you a lot of power to define precise schemas for more complex functions. Be careful,
+though - the more complex your schemas, the more likely the model is to get confused when dealing with them! We 
+recommend simple function signatures where possible, keeping arguments (and especially complex, nested arguments) 
+to a minimum.
+
+Here is an example of defining schemas by hand, and passing them directly to `apply_chat_template`:
+
+```python
+# A simple function that takes no arguments
+current_time = {
+  "type": "function", 
+  "function": {
+    "name": "current_time",
+    "description": "Get the current local time as a string.",
+    "parameters": {
+      'type': 'object',
+      'properties': {}
+    }
+  }
+}
+
+# A more complete function that takes two numerical arguments
+multiply = {
+  'type': 'function',
+  'function': {
+    'name': 'multiply',
+    'description': 'A function that multiplies two numbers', 
+    'parameters': {
+      'type': 'object', 
+      'properties': {
+        'a': {
+          'type': 'number',
+          'description': 'The first number to multiply'
+        }, 
+        'b': {
+          'type': 'number', 'description': 'The second number to multiply'
+        }
+      }, 
+      'required': ['a', 'b']
+    }
+  }
+}
+
+model_input = tokenizer.apply_chat_template(
+    messages,
+    tools = [current_time, multiply]
+)
+```
+
+## Advanced: Retrieval-augmented generation
+
+"Retrieval-augmented generation" or "RAG" LLMs can search a corpus of documents for information before responding
+to a query. This allows models to vastly expand their knowledge base beyond their limited context size. Our 
+recommendation for RAG models is that their template
+should accept a `documents` argument. This should be a list of documents, where each "document"
+is a single dict with `title` and `contents` keys, both of which are strings. Because this format is much simpler
+than the JSON schemas used for tools, no helper functions are necessary.
+
+Here's an example of a RAG template in action:
+
+```python
+document1 = {
+    "title": "The Moon: Our Age-Old Foe",
+    "contents": "Man has always dreamed of destroying the moon. In this essay, I shall..."
+}
+
+document2 = {
+    "title": "The Sun: Our Age-Old Friend",
+    "contents": "Although often underappreciated, the sun provides several notable benefits..."
+}
+
+model_input = tokenizer.apply_chat_template(
+    messages,
+    documents=[document1, document2]
+)
+```
+
 ## Advanced: How do chat templates work?

 The chat template for a model is stored on the `tokenizer.chat_template` attribute. If no chat template is set, the
--- a/src/transformers/tokenization_utils_base.py
+++ b/src/transformers/tokenization_utils_base.py
@@ -28,6 +28,7 @@ from collections.abc import Mapping, Sized
 from contextlib import contextmanager
 from dataclasses import dataclass
 from functools import lru_cache
+from inspect import isfunction
 from typing import TYPE_CHECKING, Any, Dict, List, NamedTuple, Optional, Sequence, Tuple, Union

 import numpy as np
@@ -47,6 +48,7 @@ from .utils import (
    copy_func,
    download_url,
    extract_commit_hash,
+    get_json_schema,
    is_flax_available,
    is_jax_tensor,
    is_mlx_available,
@@ -1683,6 +1685,8 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
    def apply_chat_template(
        self,
        conversation: Union[List[Dict[str, str]], List[List[Dict[str, str]]]],
+        tools: Optional[List[Dict]] = None,
+        documents: Optional[List[Dict[str, str]]] = None,
        chat_template: Optional[str] = None,
        add_generation_prompt: bool = False,
        tokenize: bool = True,
@@ -1703,8 +1707,21 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
        Args:
            conversation (Union[List[Dict[str, str]], List[List[Dict[str, str]]]]): A list of dicts
                with "role" and "content" keys, representing the chat history so far.
-            chat_template (str, *optional*): A Jinja template to use for this conversion. If
-                this is not passed, the model's default chat template will be used instead.
+            tools (`List[Dict]`, *optional*):
+                A list of tools (callable functions) that will be accessible to the model. If the template does not
+                support function calling, this argument will have no effect. Each tool should be passed as a JSON Schema,
+                giving the name, description and argument types for the tool. See our
+                [chat templating guide](https://huggingface.co/docs/transformers/main/en/chat_templating#automated-function-conversion-for-tool-use)
+                for more information.
+            documents (`List[Dict[str, str]]`, *optional*):
+                A list of dicts representing documents that will be accessible to the model if it is performing RAG
+                (retrieval-augmented generation). If the template does not support RAG, this argument will have no
+                effect. We recommend that each document should be a dict containing "title" and "text" keys. Please
+                see the RAG section of the [chat templating guide](https://huggingface.co/docs/transformers/main/en/chat_templating#arguments-for-RAG)
+                for examples of passing documents with chat templates.
+            chat_template (`str`, *optional*):
+                A Jinja template to use for this conversion. It is usually not necessary to pass anything to this
+                argument, as the model's template will be used by default.
            add_generation_prompt (bool, *optional*): Whether to end the prompt with the token(s) that indicate
                the start of an assistant message. This is useful when you want to generate a response from the model.
                Note that this argument will be passed to the chat template, and so it must be supported in the
@@ -1802,6 +1819,27 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
            conversations = [conversation]
            is_batched = False

+        # We accept either JSON schemas or functions for tools. If we get functions, we convert them to schemas
+        if tools is not None:
+            tool_schemas = []
+            for tool in tools:
+                if isinstance(tool, dict):
+                    tool_schemas.append(tool)
+                elif isfunction(tool):
+                    tool_schemas.append(get_json_schema(tool))
+                else:
+                    raise ValueError(
+                        "Tools should either be a JSON schema, or a callable function with type hints "
+                        "and a docstring suitable for auto-conversion to a schema."
+                    )
+        else:
+            tool_schemas = None
+
+        if documents is not None:
+            for document in documents:
+                if not isinstance(document, dict):
+                    raise TypeError("Documents should be a list of dicts with 'title' and 'text' keys!")
+
        rendered = []
        template_kwargs = {**self.special_tokens_map, **kwargs}  # kwargs overwrite special tokens if both are present
        for chat in conversations:
@@ -1809,7 +1847,11 @@ class PreTrainedTokenizerBase(SpecialTokensMixin, PushToHubMixin):
                # Indicates it's a Conversation object
                chat = chat.messages
            rendered_chat = compiled_template.render(
-                messages=chat, add_generation_prompt=add_generation_prompt, **template_kwargs
+                messages=chat,
+                tools=tool_schemas,
+                documents=documents,
+                add_generation_prompt=add_generation_prompt,
+                **template_kwargs,
            )
            rendered.append(rendered_chat)

--- a/src/transformers/utils/init.py
+++ b/src/transformers/utils/init.py
@@ -21,6 +21,7 @@ from packaging import version

 from .. import __version__
 from .backbone_utils import BackboneConfigMixin, BackboneMixin
+from .chat_template_utils import DocstringParsingException, TypeHintParsingException, get_json_schema
 from .constants import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD, IMAGENET_STANDARD_MEAN, IMAGENET_STANDARD_STD
 from .doc import (
    add_code_sample_docstrings,
--- a/src/transformers/utils/chat_template_utils.py
+++ b/src/transformers/utils/chat_template_utils.py
@@ -0,0 +1,316 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import inspect
+import json
+import re
+from typing import Any, Callable, Dict, Optional, Tuple, Union, get_args, get_origin, get_type_hints
+
+
+BASIC_TYPES = (int, float, str, bool, Any, type(None), ...)
+# Extracts the initial segment of the docstring, containing the function description
+description_re = re.compile(r"^(.*?)[\n\s]*(Args:|Returns:|Raises:|\Z)", re.DOTALL)
+# Extracts the Args: block from the docstring
+args_re = re.compile(r"\n\s*Args:\n\s*(.*?)[\n\s]*(Returns:|Raises:|\Z)", re.DOTALL)
+# Splits the Args: block into individual arguments
+args_split_re = re.compile(
+    r"""
+(?:^|\n)  # Match the start of the args block, or a newline
+\s*(\w+):\s*  # Capture the argument name and strip spacing
+(.*?)\s*  # Capture the argument description, which can span multiple lines, and strip trailing spacing
+(?=\n\s*\w+:|\Z)  # Stop when you hit the next argument or the end of the block
+""",
+    re.DOTALL | re.VERBOSE,
+)
+# Extracts the Returns: block from the docstring, if present. Note that most chat templates ignore the return type/doc!
+returns_re = re.compile(r"\n\s*Returns:\n\s*(.*?)[\n\s]*(Raises:|\Z)", re.DOTALL)
+
+
+class TypeHintParsingException(Exception):
+    """Exception raised for errors in parsing type hints to generate JSON schemas"""
+
+    pass
+
+
+class DocstringParsingException(Exception):
+    """Exception raised for errors in parsing docstrings to generate JSON schemas"""
+
+    pass
+
+
+def _get_json_schema_type(param_type: str) -> Dict[str, str]:
+    type_mapping = {
+        int: {"type": "integer"},
+        float: {"type": "number"},
+        str: {"type": "string"},
+        bool: {"type": "boolean"},
+        Any: {},
+    }
+    return type_mapping.get(param_type, {"type": "object"})
+
+
+def _parse_type_hint(hint: str) -> Dict:
+    origin = get_origin(hint)
+    args = get_args(hint)
+
+    if origin is None:
+        try:
+            return _get_json_schema_type(hint)
+        except KeyError:
+            raise TypeHintParsingException(
+                "Couldn't parse this type hint, likely due to a custom class or object: ", hint
+            )
+
+    elif origin is Union:
+        # Recurse into each of the subtypes in the Union, except None, which is handled separately at the end
+        subtypes = [_parse_type_hint(t) for t in args if t != type(None)]
+        if len(subtypes) == 1:
+            # A single non-null type can be expressed directly
+            return_dict = subtypes[0]
+        elif all(isinstance(subtype["type"], str) for subtype in subtypes):
+            # A union of basic types can be expressed as a list in the schema
+            return_dict = {"type": [subtype["type"] for subtype in subtypes]}
+        else:
+            # A union of more complex types requires "anyOf"
+            return_dict = {"anyOf": subtypes}
+        if type(None) in args:
+            return_dict["nullable"] = True
+        return return_dict
+
+    elif origin is list:
+        if not args:
+            return {"type": "array"}
+        else:
+            # Lists can only have a single type argument, so recurse into it
+            return {"type": "array", "items": _parse_type_hint(args[0])}
+
+    elif origin is tuple:
+        if not args:
+            return {"type": "array"}
+        if len(args) == 1:
+            raise TypeHintParsingException(
+                f"The type hint {str(hint).replace('typing.', '')} is a Tuple with a single element, which "
+                "we do not automatically convert to JSON schema as it is rarely necessary. If this input can contain "
+                "more than one element, we recommend "
+                "using a List[] type instead, or if it really is a single element, remove the Tuple[] wrapper and just "
+                "pass the element directly."
+            )
+        if ... in args:
+            raise TypeHintParsingException(
+                "Conversion of '...' is not supported in Tuple type hints. "
+                "Use List[] types for variable-length"
+                " inputs instead."
+            )
+        return {"type": "array", "prefixItems": [_parse_type_hint(t) for t in args]}
+
+    elif origin is dict:
+        # The JSON equivalent to a dict is 'object', which mandates that all keys are strings
+        # However, we can specify the type of the dict values with "additionalProperties"
+        out = {"type": "object"}
+        if len(args) == 2:
+            out["additionalProperties"] = _parse_type_hint(args[1])
+        return out
+
+    raise TypeHintParsingException("Couldn't parse this type hint, likely due to a custom class or object: ", hint)
+
+
+def _convert_type_hints_to_json_schema(func: Callable) -> Dict:
+    type_hints = get_type_hints(func)
+    signature = inspect.signature(func)
+    required = []
+    for param_name, param in signature.parameters.items():
+        if param.annotation == inspect.Parameter.empty:
+            raise TypeHintParsingException(f"Argument {param.name} is missing a type hint in function {func.__name__}")
+        if param.default == inspect.Parameter.empty:
+            required.append(param_name)
+
+    properties = {}
+    for param_name, param_type in type_hints.items():
+        properties[param_name] = _parse_type_hint(param_type)
+
+    schema = {"type": "object", "properties": properties}
+    if required:
+        schema["required"] = required
+
+    return schema
+
+
+def parse_google_format_docstring(docstring: str) -> Tuple[Optional[str], Optional[Dict], Optional[str]]:
+    """
+    Parses a Google-style docstring to extract the function description,
+    argument descriptions, and return description.
+
+    Args:
+        docstring (str): The docstring to parse.
+
+    Returns:
+        The function description, arguments, and return description.
+    """
+
+    # Extract the sections
+    description_match = description_re.search(docstring)
+    args_match = args_re.search(docstring)
+    returns_match = returns_re.search(docstring)
+
+    # Clean and store the sections
+    description = description_match.group(1).strip() if description_match else None
+    docstring_args = args_match.group(1).strip() if args_match else None
+    returns = returns_match.group(1).strip() if returns_match else None
+
+    # Parsing the arguments into a dictionary
+    if docstring_args is not None:
+        docstring_args = "\n".join([line for line in docstring_args.split("\n") if line.strip()])  # Remove blank lines
+        matches = args_split_re.findall(docstring_args)
+        args_dict = {match[0]: re.sub(r"\s*\n+\s*", " ", match[1].strip()) for match in matches}
+    else:
+        args_dict = {}
+
+    return description, args_dict, returns
+
+
+def get_json_schema(func: Callable) -> Dict:
+    """
+    This function generates a JSON schema for a given function, based on its docstring and type hints. This is
+    mostly used for passing lists of tools to a chat template. The JSON schema contains the name and description of
+    the function, as well as the names, types and descriptions for each of its arguments. `get_json_schema()` requires
+    that the function has a docstring, and that each argument has a description in the docstring, in the standard
+    Google docstring format shown below. It also requires that all the function arguments have a valid Python type hint.
+
+    Although it is not required, a `Returns` block can also be added, which will be included in the schema. This is
+    optional because most chat templates ignore the return value of the function.
+
+    Args:
+        func: The function to generate a JSON schema for.
+
+    Returns:
+        A dictionary containing the JSON schema for the function.
+
+    Examples:
+    ```python
+    >>> def multiply(x: float, y: float):
+    >>>    '''
+    >>>    A function that multiplies two numbers
+    >>>
+    >>>    Args:
+    >>>        x: The first number to multiply
+    >>>        y: The second number to multiply
+    >>>    '''
+    >>>    return x * y
+    >>>
+    >>> print(get_json_schema(multiply))
+    {
+        "name": "multiply",
+        "description": "A function that multiplies two numbers",
+        "parameters": {
+            "type": "object",
+            "properties": {
+                "x": {"type": "number", "description": "The first number to multiply"},
+                "y": {"type": "number", "description": "The second number to multiply"}
+            },
+            "required": ["x", "y"]
+        }
+    }
+    ```
+
+    The general use for these schemas is that they are used to generate tool descriptions for chat templates that
+    support them, like so:
+
+    ```python
+    >>> from transformers import AutoTokenizer
+    >>> from transformers.utils import get_json_schema
+    >>>
+    >>> def multiply(x: float, y: float):
+    >>>    '''
+    >>>    A function that multiplies two numbers
+    >>>
+    >>>    Args:
+    >>>        x: The first number to multiply
+    >>>        y: The second number to multiply
+    >>>    return x * y
+    >>>    '''
+    >>>
+    >>> multiply_schema = get_json_schema(multiply)
+    >>> tokenizer = AutoTokenizer.from_pretrained("CohereForAI/c4ai-command-r-v01")
+    >>> messages = [{"role": "user", "content": "What is 179 x 4571?"}]
+    >>> formatted_chat = tokenizer.apply_chat_template(
+    >>>     messages,
+    >>>     tools=[multiply_schema],
+    >>>     chat_template="tool_use",
+    >>>     return_dict=True,
+    >>>     return_tensors="pt",
+    >>>     add_generation_prompt=True
+    >>> )
+    >>> # The formatted chat can now be passed to model.generate()
+    ```
+
+    Each argument description can also have an optional `(choices: ...)` block at the end, such as
+    `(choices: ["tea", "coffee"])`, which will be parsed into an `enum` field in the schema. Note that this will
+    only be parsed correctly if it is at the end of the line:
+
+    ```python
+    >>> def drink_beverage(beverage: str):
+    >>>    '''
+    >>>    A function that drinks a beverage
+    >>>
+    >>>    Args:
+    >>>        beverage: The beverage to drink (choices: ["tea", "coffee"])
+    >>>    '''
+    >>>    pass
+    >>>
+    >>> print(get_json_schema(drink_beverage))
+    ```
+    {
+        'name': 'drink_beverage',
+        'description': 'A function that drinks a beverage',
+        'parameters': {
+            'type': 'object',
+            'properties': {
+                'beverage': {
+                    'type': 'string',
+                    'enum': ['tea', 'coffee'],
+                    'description': 'The beverage to drink'
+                    }
+                },
+            'required': ['beverage']
+        }
+    }
+    """
+    doc = inspect.getdoc(func)
+    if not doc:
+        raise DocstringParsingException(
+            f"Cannot generate JSON schema for {func.__name__} because it has no docstring!"
+        )
+    doc = doc.strip()
+    main_doc, param_descriptions, return_doc = parse_google_format_docstring(doc)
+
+    json_schema = _convert_type_hints_to_json_schema(func)
+    if (return_dict := json_schema["properties"].pop("return", None)) is not None:
+        if return_doc is not None:  # We allow a missing return docstring since most templates ignore it
+            return_dict["description"] = return_doc
+    for arg, schema in json_schema["properties"].items():
+        if arg not in param_descriptions:
+            raise DocstringParsingException(
+                f"Cannot generate JSON schema for {func.__name__} because the docstring has no description for the argument '{arg}'"
+            )
+        desc = param_descriptions[arg]
+        enum_choices = re.search(r"\(choices:\s*(.*?)\)\s*$", desc, flags=re.IGNORECASE)
+        if enum_choices:
+            schema["enum"] = [c.strip() for c in json.loads(enum_choices.group(1))]
+            desc = enum_choices.string[: enum_choices.start()].strip()
+        schema["description"] = desc
+
+    output = {"name": func.__name__, "description": main_doc, "parameters": json_schema}
+    if return_dict is not None:
+        output["return"] = return_dict
+    return {"type": "function", "function": output}
--- a/tests/utils/test_chat_template_utils.py
+++ b/tests/utils/test_chat_template_utils.py
@@ -0,0 +1,476 @@
+# Copyright 2024 The HuggingFace Team. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import unittest
+from typing import List, Optional, Tuple, Union
+
+from transformers.utils import DocstringParsingException, TypeHintParsingException, get_json_schema
+
+
+class JsonSchemaGeneratorTest(unittest.TestCase):
+    def test_simple_function(self):
+        def fn(x: int):
+            """
+            Test function
+
+            Args:
+                 x: The input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {"x": {"type": "integer", "description": "The input"}},
+                "required": ["x"],
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_no_arguments(self):
+        def fn():
+            """
+            Test function
+            """
+            return True
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {"type": "object", "properties": {}},
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_union(self):
+        def fn(x: Union[int, float]):
+            """
+            Test function
+
+            Args:
+                x: The input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {"x": {"type": ["integer", "number"], "description": "The input"}},
+                "required": ["x"],
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_optional(self):
+        def fn(x: Optional[int]):
+            """
+            Test function
+
+            Args:
+                x: The input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {"x": {"type": "integer", "description": "The input", "nullable": True}},
+                "required": ["x"],
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_default_arg(self):
+        def fn(x: int = 42):
+            """
+            Test function
+
+            Args:
+                 x: The input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {"type": "object", "properties": {"x": {"type": "integer", "description": "The input"}}},
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_nested_list(self):
+        def fn(x: List[List[Union[str, int]]]):
+            """
+            Test function
+
+            Args:
+                x: The input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {
+                        "type": "array",
+                        "items": {"type": "array", "items": {"type": ["string", "integer"]}},
+                        "description": "The input",
+                    }
+                },
+                "required": ["x"],
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_multiple_arguments(self):
+        def fn(x: int, y: str):
+            """
+            Test function
+
+            Args:
+                x: The input
+                y: Also the input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {"type": "integer", "description": "The input"},
+                    "y": {"type": "string", "description": "Also the input"},
+                },
+                "required": ["x", "y"],
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_multiple_complex_arguments(self):
+        def fn(x: List[Union[int, float]], y: Optional[Union[int, str]] = None):
+            """
+            Test function
+
+            Args:
+                x: The input
+                y: Also the input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {"type": "array", "items": {"type": ["integer", "number"]}, "description": "The input"},
+                    "y": {
+                        "type": ["integer", "string"],
+                        "nullable": True,
+                        "description": "Also the input",
+                    },
+                },
+                "required": ["x"],
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_missing_docstring(self):
+        def fn(x: int):
+            return x
+
+        with self.assertRaises(DocstringParsingException):
+            get_json_schema(fn)
+
+    def test_missing_param_docstring(self):
+        def fn(x: int):
+            """
+            Test function
+            """
+            return x
+
+        with self.assertRaises(DocstringParsingException):
+            get_json_schema(fn)
+
+    def test_missing_type_hint(self):
+        def fn(x):
+            """
+            Test function
+
+            Args:
+                 x: The input
+            """
+            return x
+
+        with self.assertRaises(TypeHintParsingException):
+            get_json_schema(fn)
+
+    def test_return_value(self):
+        def fn(x: int) -> int:
+            """
+            Test function
+
+            Args:
+                x: The input
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {"x": {"type": "integer", "description": "The input"}},
+                "required": ["x"],
+            },
+            "return": {"type": "integer"},
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_return_value_docstring(self):
+        def fn(x: int) -> int:
+            """
+            Test function
+
+            Args:
+                x: The input
+
+
+            Returns:
+                The output
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {"x": {"type": "integer", "description": "The input"}},
+                "required": ["x"],
+            },
+            "return": {"type": "integer", "description": "The output"},
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_tuple(self):
+        def fn(x: Tuple[int, str]):
+            """
+            Test function
+
+            Args:
+                x: The input
+
+
+            Returns:
+                The output
+            """
+            return x
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {
+                        "type": "array",
+                        "prefixItems": [{"type": "integer"}, {"type": "string"}],
+                        "description": "The input",
+                    }
+                },
+                "required": ["x"],
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_single_element_tuple_fails(self):
+        def fn(x: Tuple[int]):
+            """
+            Test function
+
+            Args:
+                x: The input
+
+
+            Returns:
+                The output
+            """
+            return x
+
+        # Single-element tuples should just be the type itself, or List[type] for variable-length inputs
+        with self.assertRaises(TypeHintParsingException):
+            get_json_schema(fn)
+
+    def test_ellipsis_type_fails(self):
+        def fn(x: Tuple[int, ...]):
+            """
+            Test function
+
+            Args:
+                x: The input
+
+
+            Returns:
+                The output
+            """
+            return x
+
+        # Variable length inputs should be specified with List[type], not Tuple[type, ...]
+        with self.assertRaises(TypeHintParsingException):
+            get_json_schema(fn)
+
+    def test_enum_extraction(self):
+        def fn(temperature_format: str):
+            """
+            Test function
+
+            Args:
+                temperature_format: The temperature format to use (Choices: ["celsius", "fahrenheit"])
+
+
+            Returns:
+                The temperature
+            """
+            return -40.0
+
+        # Let's see if that gets correctly parsed as an enum
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "temperature_format": {
+                        "type": "string",
+                        "enum": ["celsius", "fahrenheit"],
+                        "description": "The temperature format to use",
+                    }
+                },
+                "required": ["temperature_format"],
+            },
+        }
+
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_multiline_docstring_with_types(self):
+        def fn(x: int, y: int):
+            """
+            Test function
+
+            Args:
+                x: The first input
+
+                y: The second input. This is a longer description
+                   that spans multiple lines with indentation and stuff.
+
+            Returns:
+                God knows what
+            """
+            pass
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {"type": "integer", "description": "The first input"},
+                    "y": {
+                        "type": "integer",
+                        "description": "The second input. This is a longer description that spans multiple lines with indentation and stuff.",
+                    },
+                },
+                "required": ["x", "y"],
+            },
+        }
+
+        self.assertEqual(schema["function"], expected_schema)
+
+    def test_everything_all_at_once(self):
+        def fn(
+            x: str, y: Optional[List[Union[str, int]]], z: Tuple[Union[str, int], str] = (42, "hello")
+        ) -> Tuple[int, str]:
+            """
+            Test function with multiple args, and docstring args that we have to strip out.
+
+            Args:
+                x: The first input. It's got a big multiline
+                   description and also contains
+                   (choices: ["a", "b", "c"])
+
+                y: The second input. It's a big list with a single-line description.
+
+                z: The third input. It's some kind of tuple with a default arg.
+
+            Returns:
+                The output. The return description is also a big multiline
+                description that spans multiple lines.
+            """
+            pass
+
+        schema = get_json_schema(fn)
+        expected_schema = {
+            "name": "fn",
+            "description": "Test function with multiple args, and docstring args that we have to strip out.",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "x": {
+                        "type": "string",
+                        "enum": ["a", "b", "c"],
+                        "description": "The first input. It's got a big multiline description and also contains",
+                    },
+                    "y": {
+                        "type": "array",
+                        "items": {"type": ["string", "integer"]},
+                        "nullable": True,
+                        "description": "The second input. It's a big list with a single-line description.",
+                    },
+                    "z": {
+                        "type": "array",
+                        "prefixItems": [{"type": ["string", "integer"]}, {"type": "string"}],
+                        "description": "The third input. It's some kind of tuple with a default arg.",
+                    },
+                },
+                "required": ["x", "y"],
+            },
+            "return": {
+                "type": "array",
+                "prefixItems": [{"type": "integer"}, {"type": "string"}],
+                "description": "The output. The return description is also a big multiline\n    description that spans multiple lines.",
+            },
+        }
+        self.assertEqual(schema["function"], expected_schema)