Test composition (#23214)
* Remove nestedness in tool config * Really do it * Use remote tools descriptions * Work * Clean up eval * Changes * Tools * Tools * tool * Fix everything * Use last result/assign for evaluation * Prompt * Remove hardcoded selection * Evaluation for chat agents * correct some spelling * Small fixes * Change summarization model (#23172) * Fix link displayed * Update description of the tool * Fixes in chat prompt * Custom tools, custom prompt * Tool clean up * save_pretrained and push_to_hub for tool * Fix init * Tests * Fix tests * Tool save/from_hub/push_to_hub and tool->load_tool * Clean push_to_hub and add app file * Custom inference API for endpoints too * Clean up * old remote tool and new remote tool * Make a requirements * return_code adds tool creation * Avoid redundancy between global variables * Remote tools can be loaded * Tests * Text summarization tests * Quality * Properly mark tests * Test the python interpreter * And the CI shall be green. * fix loading of additional tools * Work on RemoteTool and fix tests * General clean up * Guard imports * Fix tools * docs: Fix broken link in 'How to add a model...' (#23216) fix link * Get default endpoint from the Hub * Add guide * Simplify tool config * Docs * Some fixes * Docs * Docs * Docs * Fix code returned by agent * Try this * Match args with signature in remote tool * Should fix python interpreter for Python 3.8 * Fix push_to_hub for tools * Other fixes to push_to_hub * Add API doc page * Docs * Docs * Custom tools * Pin tensorflow-probability (#23220) * Pin tensorflow-probability * [all-test] * [all-test] Fix syntax for bash * PoC for some chaining API * Text to speech * J'ai pris des libertés * Rename * Basic python interpreter * Add agents * Quality * Add translation tool * temp * GenQA + LID + S2T * Quality + word missing in translation * Add open assistance, support f-strings in evaluate * captioning + s2t fixes * Style * Refactor descriptions and remove chain * Support errors and rename OpenAssistantAgent * Add setup * Deal with typos + example of inference API * Some rename + README * Fixes * Update prompt * Unwanted change * Make sure everyone has a default * One prompt to rule them all. * SD * Description * Clean up remote tools * More remote tools * Add option to return code and update doc * Image segmentation * ControlNet * Gradio demo * Diffusers protection * Lib protection * ControlNet description * Cleanup * Style * Remove accelerate and try to be reproducible * No randomness * Male Basic optional in token * Clean description * Better prompts * Fix args eval in interpreter * Add tool wrapper * Tool on the Hub * Style post-rebase * Big refactor of descriptions, batch generation and evaluation for agents * Make problems easier - interface to debug * More problems, add python primitives * Back to one prompt * Remove dict for translation * Be consistent * Add prompts * New version of the agent * Evaluate new agents * New endpoints agents * Make all tools a dict variable * Typo * Add problems * Add to big prompt * Harmonize * Add tools * New evaluation * Add more tools * Build prompt with tools descriptions * Tools on the Hub * Let's chat! * Cleanup * Temporary bs4 safeguard * Cache agents and clean up * Blank init * Fix evaluation for agents * New format for tools on the Hub * Add method to reset state * Remove nestedness in tool config * Really do it * Use remote tools descriptions * Work * Clean up eval * Changes * Tools * Tools * tool * Fix everything * Use last result/assign for evaluation * Prompt * Remove hardcoded selection * Evaluation for chat agents * correct some spelling * Small fixes * Change summarization model (#23172) * Fix link displayed * Update description of the tool * Fixes in chat prompt * Custom tools, custom prompt * Tool clean up * save_pretrained and push_to_hub for tool * Fix init * Tests * Fix tests * Tool save/from_hub/push_to_hub and tool->load_tool * Clean push_to_hub and add app file * Custom inference API for endpoints too * Clean up * old remote tool and new remote tool * Make a requirements * return_code adds tool creation * Avoid redundancy between global variables * Remote tools can be loaded * Tests * Text summarization tests * Quality * Properly mark tests * Test the python interpreter * And the CI shall be green. * Work on RemoteTool and fix tests * fix loading of additional tools * General clean up * Guard imports * Fix tools * Get default endpoint from the Hub * Simplify tool config * Add guide * Docs * Some fixes * Docs * Docs * Fix code returned by agent * Try this * Docs * Match args with signature in remote tool * Should fix python interpreter for Python 3.8 * Fix push_to_hub for tools * Other fixes to push_to_hub * Add API doc page * Fixes * Doc fixes * Docs * Fix audio * Custom tools * Audio fix * Improve custom tools docstring * Docstrings * Trigger CI * Mode docstrings * More docstrings * Improve custom tools * Fix for remote tools * Style * Fix repo consistency * Quality * Tip * Cleanup on doc * Cleanup toc * Add disclaimer for starcoder vs openai * Remove disclaimer * Small fixed in the prompts * 4.29 * Update src/transformers/tools/agents.py Co-authored-by: Lysandre Debut <lysandre.debut@reseau.eseo.fr> * Complete documentation * Small fixes * Agent evaluation * Note about gradio-tools & LC * Clean up agents and prompt * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Apply suggestions from code review Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * Note about gradio-tools & LC * Add copyrights and address review comments * Quality * Add all language codes * Add remote tool tests * Move custom prompts to other docs * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * TTS tests * Quality --------- Co-authored-by: Lysandre <hi@lyand.re> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com> Co-authored-by: Connor Henderson <connor.henderson@talkiatry.com> Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr> Co-authored-by: Lysandre <lysandre@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
This commit is contained in:
committed by
Sylvain Gugger
parent
d5e1c98120
commit
2a2be57697
@@ -21,6 +21,8 @@
|
||||
title: Set up distributed training with 🤗 Accelerate
|
||||
- local: model_sharing
|
||||
title: Share your model
|
||||
- local: transformers_agents
|
||||
title: Agents
|
||||
title: Tutorials
|
||||
- sections:
|
||||
- sections:
|
||||
@@ -99,6 +101,8 @@
|
||||
title: Notebooks with examples
|
||||
- local: community
|
||||
title: Community resources
|
||||
- local: custom_tools
|
||||
title: Custom Tools
|
||||
- local: troubleshooting
|
||||
title: Troubleshoot
|
||||
title: Developer guides
|
||||
@@ -179,6 +183,8 @@
|
||||
title: Conceptual guides
|
||||
- sections:
|
||||
- sections:
|
||||
- local: main_classes/agent
|
||||
title: Agents and Tools
|
||||
- local: model_doc/auto
|
||||
title: Auto Classes
|
||||
- local: main_classes/callback
|
||||
|
||||
503
docs/source/en/custom_tools.mdx
Normal file
503
docs/source/en/custom_tools.mdx
Normal file
@@ -0,0 +1,503 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Custom Tools and Prompts
|
||||
|
||||
<Tip>
|
||||
|
||||
If you are not aware of what tools and agents are in the context of transformers, we recommend you read the
|
||||
[Transformers Agents](transformers_agents) page first.
|
||||
|
||||
</Tip>
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
Transformers Agent is an experimental API which is subject to change at any time. Results returned by the agents
|
||||
can vary as the APIs or underlying models are prone to change.
|
||||
|
||||
</Tip>
|
||||
|
||||
Creating and using custom tools and prompts is paramount to empowering the agent and having it perform new tasks.
|
||||
In this guide we'll take a look at:
|
||||
|
||||
- How to customize the prompt
|
||||
- How to use custom tools
|
||||
- How to create custom tools
|
||||
|
||||
## Customizing the prompt
|
||||
|
||||
As explained in [Transformers Agents](transformers_agents) agents can run in [`~Agent.run`] and [`~Agent.chat`] mode.
|
||||
Both the run and chat mode underlie the same logic. The language model powering the agent is conditioned on a long prompt
|
||||
and simply asked to complete the prompt by generating next tokens until the stop token is reached.
|
||||
The only difference between the `run` and `chat` mode is that during the `chat` mode the prompt is extended with
|
||||
previous user inputs and model generations, which seemingly gives the agent a memory and allows it to refer to
|
||||
past interactions.
|
||||
|
||||
Let's take a closer look into how the prompt is structured to understand how it can be best customized.
|
||||
The prompt is structured broadly into four parts.
|
||||
|
||||
- 1. Introduction: how the agent should behave, explanation of the concept of tools.
|
||||
- 2. Description of all the tools. This is defined by a `<<all_tools>>` token that is dynamically replaced at runtime with the tools defined/chosen by the user.
|
||||
- 3. A set of examples of tasks and their solution
|
||||
- 4. Current example, and request for solution.
|
||||
|
||||
To better understand each part, let's look at a shortened version of how such a prompt can look like in practice.
|
||||
|
||||
```
|
||||
I will ask you to perform a task, your job is to come up with a series of simple commands in Python that will perform the task.
|
||||
[...]
|
||||
You can print intermediate results if it makes sense to do so.
|
||||
|
||||
Tools:
|
||||
- document_qa: This is a tool that answers a question about an document (pdf). It takes an input named `document` which should be the document containing the information, as well as a `question` that is the question about the document. It returns a text that contains the answer to the question.
|
||||
- image_captioner: This is a tool that generates a description of an image. It takes an input named `image` which should be the image to caption, and returns a text that contains the description in English.
|
||||
[...]
|
||||
|
||||
Task: "Answer the question in the variable `question` about the image stored in the variable `image`. The question is in French."
|
||||
|
||||
I will use the following tools: `translator` to translate the question into English and then `image_qa` to answer the question on the input image.
|
||||
|
||||
Answer:
|
||||
```py
|
||||
translated_question = translator(question=question, src_lang="French", tgt_lang="English")
|
||||
print(f"The translated question is {translated_question}.")
|
||||
answer = image_qa(image=image, question=translated_question)
|
||||
print(f"The answer is {answer}")
|
||||
```
|
||||
|
||||
Task: "Identify the oldest person in the `document` and create an image showcasing the result as a banner."
|
||||
|
||||
I will use the following tools: `document_qa` to find the oldest person in the document, then `image_generator` to generate an image according to the answer.
|
||||
|
||||
Answer:
|
||||
```py
|
||||
answer = document_qa(document, question="What is the oldest person?")
|
||||
print(f"The answer is {answer}.")
|
||||
image = image_generator("A banner showing " + answer)
|
||||
```
|
||||
|
||||
[...]
|
||||
|
||||
Task: "Draw me a picture of rivers and lakes"
|
||||
|
||||
I will use the following
|
||||
```
|
||||
|
||||
The first part explains precisely how the model shall behave and what it should do. This part
|
||||
most likely does not need to be customized.
|
||||
|
||||
TODO(PVP) - explain better how the .description and .name influence the prompt
|
||||
|
||||
### Customizing the tool descriptions
|
||||
|
||||
The performance of the agent is directly linked to the prompt itself. We structure the prompt so that it works well
|
||||
with what we intend for the agent to do; but for maximum customization we also offer the ability to specify a different prompt when instantiating the agent.
|
||||
|
||||
### Customizing the single-execution prompt
|
||||
|
||||
In order to specify a custom single-execution prompt, one would so the following:
|
||||
|
||||
```py
|
||||
template = """ [...] """
|
||||
|
||||
agent = HfAgent(your_endpoint, run_prompt_template=template)
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
Please make sure to have the `<<all_tools>>` string defined somewhere in the `template` so that the agent can be aware
|
||||
of the tools it has available to it.
|
||||
|
||||
</Tip>
|
||||
|
||||
#### Chat-execution prompt
|
||||
|
||||
In order to specify a custom single-execution prompt, one would so the following:
|
||||
|
||||
```
|
||||
template = """ [...] """
|
||||
|
||||
agent = HfAgent(
|
||||
url_endpoint=your_endpoint,
|
||||
token=your_hf_token,
|
||||
chat_prompt_template=template
|
||||
)
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
Please make sure to have the `<<all_tools>>` string defined somewhere in the `template` so that the agent can be
|
||||
aware of the tools it has available to it.
|
||||
|
||||
</Tip>
|
||||
|
||||
## Using custom tools
|
||||
|
||||
In this section, we'll be leveraging two existing custom tools that are specific to image generation:
|
||||
|
||||
- We replace [huggingface-tools/image-transformation](https://huggingface.co/spaces/huggingface-tools/image-transformation),
|
||||
with [diffusers/controlnet-canny-tool](https://huggingface.co/spaces/diffusers/controlnet-canny-tool)
|
||||
to allow for more image modifications.
|
||||
- We add a new tool for image upscaling to the default toolbox:
|
||||
[diffusers/latent-upscaler-tool](https://huggingface.co/spaces/diffusers/latent-upscaler-tool) replace the existing image-transformation tool.
|
||||
|
||||
We'll start by loading the custom tools with the convenient [`load_tool`] function:
|
||||
|
||||
```py
|
||||
from transformers import load_tool
|
||||
|
||||
controlnet_transformer = load_tool("diffusers/controlnet-canny-tool")
|
||||
upscaler = load_tool("diffusers/latent-upscaler-tool")
|
||||
```
|
||||
|
||||
Upon adding custom tools to an agent, the tools' descriptions and names are automatically
|
||||
included in the agents' prompts. Thus, it is imperative that custom tools have
|
||||
a well-written description and name in order for the agent to understand how to use them.
|
||||
Let's take a look at the description and name of `controlnet_transformer`:
|
||||
|
||||
```py
|
||||
print(f"Description: '{controlnet_transformer.description}'")
|
||||
print(f"Name: '{controlnet_transformer.name}'")
|
||||
```
|
||||
|
||||
gives
|
||||
```
|
||||
Description: 'This is a tool that transforms an image with ControlNet according to a prompt.
|
||||
It takes two inputs: `image`, which should be the image to transform, and `prompt`, which should be the prompt to use to change it. It returns the modified image.'
|
||||
Name: 'image_transformer'
|
||||
```
|
||||
|
||||
The name and description is accurate and fits the style of the [curated set of tools](./transformers_agents#a-curated-set-of-tools).
|
||||
Next, let's instantiate an agent with `controlnet_transformer` and `upscaler`:
|
||||
|
||||
```py
|
||||
tools = [controlnet_transformer, upscaler]
|
||||
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=tools)
|
||||
```
|
||||
|
||||
This command should give you the following info:
|
||||
|
||||
```
|
||||
image_transformer has been replaced by <transformers_modules.diffusers.controlnet-canny-tool.bd76182c7777eba9612fc03c0
|
||||
8718a60c0aa6312.image_transformation.ControlNetTransformationTool object at 0x7f1d3bfa3a00> as provided in `additional_tools`
|
||||
```
|
||||
|
||||
The set of curated tools already has a `image_transformer` tool which is hereby replaced with our custom tool.
|
||||
|
||||
<Tip>
|
||||
|
||||
Overwriting existing tools can be beneficial if we want to use a custom tool exactly for the same task as an existing tool
|
||||
because the agent is well-versed in using the specific task. Beware that the custom tool should follow the exact same API
|
||||
as the overwritten tool in this case.
|
||||
|
||||
</Tip>
|
||||
|
||||
The upscaler tool was given the name `image_upscaler` which is not yet present in the default toolbox and is therefore is simply added to the list of tools.
|
||||
You can always have a look at the toolbox that is currently available to the agent via the `agent.toolbox` attribute:
|
||||
|
||||
```py
|
||||
print("\n".join([f"- {a}" for a in agent.toolbox.keys()]))
|
||||
```
|
||||
|
||||
```
|
||||
- document_qa
|
||||
- image_captioner
|
||||
- image_qa
|
||||
- image_segmenter
|
||||
- transcriber
|
||||
- summarizer
|
||||
- text_classifier
|
||||
- text_qa
|
||||
- text_reader
|
||||
- translator
|
||||
- image_transformer
|
||||
- text_downloader
|
||||
- image_generator
|
||||
- video_generator
|
||||
- image_upscaler
|
||||
```
|
||||
|
||||
Note how `image_upscaler` is now part of the agents' toolbox.
|
||||
|
||||
Let's now try out the new tools! We will re-use the image we generated in (Transformers Agents Quickstart)[./transformers_agents#single-execution-run].
|
||||
|
||||
```py
|
||||
from diffusers.utils import load_image
|
||||
|
||||
image = load_image(
|
||||
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png"
|
||||
)
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
|
||||
|
||||
Let's transform the image into a beautiful winter landscape:
|
||||
|
||||
```py
|
||||
image = agent.run("Transform the image: 'A frozen lake and snowy forest'", image=image)
|
||||
```
|
||||
|
||||
```
|
||||
==Explanation from the agent==
|
||||
I will use the following tool: `image_transformer` to transform the image.
|
||||
|
||||
|
||||
==Code generated by the agent==
|
||||
image = image_transformer(image, prompt="A frozen lake and snowy forest")
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes_winter.png" width=200>
|
||||
|
||||
The new image processing tool is based on ControlNet which is can make very strong modifications to the image.
|
||||
By default the image processing tool returns an image of size 512x512 pixels. Let's see if we can upscale it.
|
||||
|
||||
```py
|
||||
image = agent.run("Upscale the image", image)
|
||||
```
|
||||
|
||||
```
|
||||
==Explanation from the agent==
|
||||
I will use the following tool: `image_upscaler` to upscale the image.
|
||||
|
||||
|
||||
==Code generated by the agent==
|
||||
upscaled_image = image_upscaler(image)
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes_winter_upscale.png" width=400>
|
||||
|
||||
The agent automatically mapped our prompt "Upscale the image" to the just added upscaler tool purely based on the description and name of the upscaler tool
|
||||
and was able to correctly run it.
|
||||
|
||||
Next, let's have a look into how you can create a new custom tool.
|
||||
|
||||
### Adding new tools
|
||||
|
||||
In this section we show how to create a new tool that can be added to the agent.
|
||||
|
||||
#### Creating a new tool
|
||||
|
||||
We'll first start by creating a tool. We'll add the not-so-useful yet fun task of fetching the model on the Hugging Face
|
||||
Hub with the most downloads for a given task.
|
||||
|
||||
We can do that with the following code:
|
||||
|
||||
```python
|
||||
from huggingface_hub import list_models
|
||||
|
||||
task = "text-classification"
|
||||
|
||||
model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
|
||||
print(model.id)
|
||||
```
|
||||
|
||||
For the task `text-classification`, this returns `'facebook/bart-large-mnli'`, for `translation` it returns `'t5-base`.
|
||||
|
||||
How do we convert this to a tool that the agent can leverage? All tools depend on the superclass `Tool` that holds the
|
||||
main attributes necessary. We'll create a class that inherits from it:
|
||||
|
||||
```python
|
||||
from transformers import Tool
|
||||
|
||||
|
||||
class HFModelDownloadsTool(Tool):
|
||||
pass
|
||||
```
|
||||
|
||||
This class has a few needs:
|
||||
- An attribute `name`, which corresponds to the name of the tool itself. To be in tune with other tools which have a
|
||||
performative name, we'll name it `model_download_counter`.
|
||||
- An attribute `description`, which will be used to populate the prompt of the agent.
|
||||
- `inputs` and `outputs` attributes. Defining this will help the python interpreter make educated choices about types,
|
||||
and will allow for a gradio-demo to be spawned when we push our tool to the Hub. They're both a list of expected
|
||||
values, which can be `text`, `image`, or `audio`.
|
||||
- A `__call__` method which contains the inference code. This is the code we've played with above!
|
||||
|
||||
Here's what our class looks like now:
|
||||
|
||||
```python
|
||||
from transformers import Tool
|
||||
from huggingface_hub import list_models
|
||||
|
||||
|
||||
class HFModelDownloadsTool(Tool):
|
||||
name = "model_download_counter"
|
||||
description = (
|
||||
"This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. "
|
||||
"It takes the name of the category (such as text-classification, depth-estimation, etc), and "
|
||||
"returns the name of the checkpoint."
|
||||
)
|
||||
|
||||
inputs = ["text"]
|
||||
outputs = ["text"]
|
||||
|
||||
def __call__(self, task: str):
|
||||
model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
|
||||
return model.id
|
||||
```
|
||||
|
||||
We now have our tool handy. Save it in a file and import it from your main script. Let's name this file
|
||||
`model_downloads.py`, so the resulting import code looks like this:
|
||||
|
||||
```python
|
||||
from model_downloads import HFModelDownloadsTool
|
||||
|
||||
tool = HFModelDownloadsTool()
|
||||
```
|
||||
|
||||
In order to let others benefit from it and for simpler initialization, we recommend pushing it to the Hub under your
|
||||
namespace. To do so, just call `push_to_hub` on the `tool` variable:
|
||||
|
||||
```python
|
||||
tool.push_to_hub("lysandre/hf-model-downloads")
|
||||
```
|
||||
|
||||
You now have your code on the Hub! Let's take a look at the final step, which is to have the agent use it.
|
||||
|
||||
#### Having the agent use the tool
|
||||
|
||||
We now have our tool that lives on the Hub which can be instantiated as such:
|
||||
|
||||
```python
|
||||
from transformers import load_tool
|
||||
|
||||
tool = load_tool("lysandre/hf-model-downloads")
|
||||
```
|
||||
|
||||
In order to use it in the agent, simply pass it in the `additional_tools` parameter of the agent initialization method:
|
||||
|
||||
```python
|
||||
from transformers import HfAgent
|
||||
|
||||
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=[tool])
|
||||
|
||||
agent.run(
|
||||
"Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?"
|
||||
)
|
||||
```
|
||||
which outputs the following:
|
||||
```
|
||||
==Code generated by the agent==
|
||||
model = model_download_counter(task="text-to-video")
|
||||
print(f"The model with the most downloads is {model}.")
|
||||
audio_model = text_reader(model)
|
||||
|
||||
|
||||
==Result==
|
||||
The model with the most downloads is damo-vilab/text-to-video-ms-1.7b.
|
||||
```
|
||||
|
||||
and generates the following audio.
|
||||
|
||||
| **Audio** |
|
||||
|------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| <audio controls><source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/damo.wav" type="audio/wav"/> |
|
||||
|
||||
|
||||
<Tip>
|
||||
|
||||
Depending on the LLM, some are quite brittle and require very exact prompts in order to work well. Having a well-defined
|
||||
description of the tool is paramount to having it be leveraged by the agent.
|
||||
|
||||
</Tip>
|
||||
|
||||
### Replacing existing tools
|
||||
|
||||
Replacing existing tools can be done simply by assigning a new item to the agent's toolbox. Here's how one would do so:
|
||||
|
||||
```python
|
||||
from transformers import HfAgent, load_tool
|
||||
|
||||
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
|
||||
agent.toolbox["image-transformation"] = load_tool("diffusers/controlnet-canny-tool")
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
Beware when replacing tools with others! This will also adjust the agent's prompt. This can be good if you have a better
|
||||
prompt suited for the task, but it can also result in your tool being selected way more than others or for other
|
||||
tools to be selected instead of the one you have defined.
|
||||
|
||||
</Tip>
|
||||
|
||||
## Leveraging gradio-tools
|
||||
|
||||
[gradio-tools](https://github.com/freddyaboulton/gradio-tools) is a powerful library that allows using Hugging
|
||||
Face Spaces as tools. It supports many existing Spaces as well as custom Spaces to be designed with it.
|
||||
|
||||
We offer support for `gradio_tools` by using the `Tool.from_gradio` method. For example, we want to take
|
||||
advantage of the `StableDiffusionPromptGeneratorTool` tool offered in the `gradio-tools` toolkit so as to
|
||||
improve our prompts and generate better images.
|
||||
|
||||
We first import the tool from `gradio_tools` and instantiate it:
|
||||
|
||||
```python
|
||||
from gradio_tools import StableDiffusionPromptGeneratorTool
|
||||
|
||||
gradio_tool = StableDiffusionPromptGeneratorTool()
|
||||
```
|
||||
|
||||
We pass that instance to the `Tool.from_gradio` method:
|
||||
|
||||
```python
|
||||
from transformers import Tool
|
||||
|
||||
tool = Tool.from_gradio(gradio_tools)
|
||||
```
|
||||
|
||||
Now we can manage it exactly as we would a usual custom tool. We leverage it to improve our prompt
|
||||
` a rabbit wearing a space suit`:
|
||||
|
||||
```python
|
||||
from transformers import HfAgent
|
||||
|
||||
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder", additional_tools=[tool])
|
||||
|
||||
agent.run("Generate an image of the `prompt` after improving it.", prompt="A rabbit wearing a space suit")
|
||||
```
|
||||
|
||||
The model adequately leverages the tool:
|
||||
```
|
||||
==Explanation from the agent==
|
||||
I will use the following tools: `StableDiffusionPromptGenerator` to improve the prompt, then `image_generator` to generate an image according to the improved prompt.
|
||||
|
||||
|
||||
==Code generated by the agent==
|
||||
improved_prompt = StableDiffusionPromptGenerator(prompt)
|
||||
print(f"The improved prompt is {improved_prompt}.")
|
||||
image = image_generator(improved_prompt)
|
||||
```
|
||||
|
||||
Before finally generating the image:
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit.png">
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
gradio-tools requires *textual* inputs and outputs, even when working with different modalities. This implementation
|
||||
works with image and audio objects. The two are currently incompatible, but will rapidly become compatible as we
|
||||
work to improve the support.
|
||||
|
||||
</Tip>
|
||||
|
||||
## Future compatibility with Langchain
|
||||
|
||||
We love Langchain and think it has a very compelling suite of tools. In order to handle these tools,
|
||||
Langchain requires *textual* inputs and outputs, even when working with different modalities.
|
||||
This is often the serialized version (i.e., saved to disk) of the objects.
|
||||
|
||||
This difference means that multi-modality isn't handled between transformers-agents and langchain.
|
||||
We aim for this limitation to be resolved in future versions, and welcome any help from avid langchain
|
||||
users to help us achieve this compatibility.
|
||||
|
||||
We would love to have better support. If you would like to help, please
|
||||
[open an issue](https://github.com/huggingface/transformers/issues/new) and share what you have in mind.
|
||||
64
docs/source/en/main_classes/agent.mdx
Normal file
64
docs/source/en/main_classes/agent.mdx
Normal file
@@ -0,0 +1,64 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Agents & Tools
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
Transformers Agent is an experimental API which is subject to change at any time. Results returned by the agents
|
||||
can vary as the APIs or underlying models are prone to change.
|
||||
|
||||
</Tip>
|
||||
|
||||
To learn more about agents and tools make sure to read the [introductory guide](../agents_and_tools). This page
|
||||
contains the API docs for the underlying classes.
|
||||
|
||||
## Agents
|
||||
|
||||
We provide two types of agents: [`HfAgent`] uses inference endpoints for opensource models and [`OpenAiAgent`] uses OpenAI closed models.
|
||||
|
||||
### HfAgent
|
||||
|
||||
[[autodoc]] HfAgent
|
||||
|
||||
### OpenAiAgent
|
||||
|
||||
[[autodoc]] OpenAiAgent
|
||||
|
||||
### Agent
|
||||
|
||||
[[autodoc]] Agent
|
||||
- chat
|
||||
- run
|
||||
- prepare_for_new_chat
|
||||
|
||||
## Tools
|
||||
|
||||
### load_tool
|
||||
|
||||
[[autodoc]] load_tool
|
||||
|
||||
### Tool
|
||||
|
||||
[[autodoc]] Tool
|
||||
|
||||
### PipelineTool
|
||||
|
||||
[[autodoc]] PipelineTool
|
||||
|
||||
### RemoteTool
|
||||
|
||||
[[autodoc]] RemoteTool
|
||||
|
||||
### launch_gradio_demo
|
||||
|
||||
[[autodoc]] launch_gradio_demo
|
||||
329
docs/source/en/transformers_agents.mdx
Normal file
329
docs/source/en/transformers_agents.mdx
Normal file
@@ -0,0 +1,329 @@
|
||||
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
||||
the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
||||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations under the License.
|
||||
-->
|
||||
|
||||
# Transformers Agent
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
Transformers Agent is an experimental API which is subject to change at any time. Results returned by the agents
|
||||
can vary as the APIs or underlying models are prone to change.
|
||||
|
||||
</Tip>
|
||||
|
||||
Transformers version v4.29.0, building on the concept of *tools* and *agents*.
|
||||
|
||||
In short, it provides a natural language API on top of transformers: we define a set of curated tools, and design an
|
||||
agent to interpret natural language and to use these tools. It is extensible by design; we curated some relevant tools,
|
||||
but we'll show you how the system can be extended easily to use any tool developed by the community.
|
||||
|
||||
Let's start with a few examples of what can be achieved with this new API. It is particularly powerful when it comes
|
||||
to multimodal tasks, so let's take it for a spin to generate images and read text out loud.
|
||||
|
||||
```py
|
||||
agent.run("Caption the following image", image=image)
|
||||
```
|
||||
|
||||
| **Input** | **Output** |
|
||||
|-----------------------------------------------------------------------------------------------------------------------------|-----------------------------------|
|
||||
| <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/beaver.png" width=200> | A beaver is swimming in the water |
|
||||
|
||||
---
|
||||
|
||||
```py
|
||||
agent.run("Read the following text out loud", text=text)
|
||||
```
|
||||
| **Input** | **Output** |
|
||||
|-------------------------------------------------------------------------------------------------------------------------|----------------------------------------------|
|
||||
| A beaver is swimming in the water | <audio controls><source src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tts_example.wav" type="audio/wav"> your browser does not support the audio element. </audio>
|
||||
|
||||
---
|
||||
|
||||
```py
|
||||
agent.run(
|
||||
"In the following `document`, where will the TRRF Scientific Advisory Council Meeting take place?",
|
||||
document=document,
|
||||
)
|
||||
```
|
||||
| **Input** | **Output** |
|
||||
|-----------------------------------------------------------------------------------------------------------------------------|----------------|
|
||||
| <img src="https://datasets-server.huggingface.co/assets/hf-internal-testing/example-documents/--/hf-internal-testing--example-documents/test/0/image/image.jpg" width=200> | ballroom foyer |
|
||||
|
||||
## Quickstart
|
||||
|
||||
Before being able to use `agent.run`, you will need to instantiate an agent, which is a large language model (LLM).
|
||||
We recommend using the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) checkpoint as it works very well
|
||||
for the task at hand and is open-source, but please find other examples below.
|
||||
|
||||
Start by logging-in to have access to the Inference API:
|
||||
|
||||
```py
|
||||
from huggingface_hub import login
|
||||
|
||||
login("<YOUR_TOKEN>")
|
||||
```
|
||||
|
||||
Then, instantiate the agent
|
||||
|
||||
```py
|
||||
from transformers import HfAgent
|
||||
|
||||
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
|
||||
```
|
||||
|
||||
This is using the inference API that Hugging Face provides for free at the moment, if you have your own inference
|
||||
endpoint for this model (or another one) you can replace the url above by your url endpoint.
|
||||
|
||||
<Tip>
|
||||
|
||||
We're showcasing StarCoder as the default in the documentation as the model is free to use and performs admirably well
|
||||
on simple tasks. However, the checkpoint doesn't hold up when handling more complex prompts. If you're facing such an
|
||||
issue, we recommend trying out the OpenAI model which, while sadly not open-source, performs better at this given time.
|
||||
|
||||
</Tip>
|
||||
|
||||
You're now good to go! Let's dive into the two APIs that you now have at your disposal.
|
||||
|
||||
### Single execution (run)
|
||||
|
||||
The single execution method is when using the [`~Agent.run`] method of the agent:
|
||||
|
||||
```py
|
||||
agent.run("Draw me a picture of rivers and lakes")
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
|
||||
|
||||
It automatically select the tool (or tools) appropriate for the task you want to perform and run them appropriately. It
|
||||
can perform one or several tasks in the same instruction (though the more complex your instruction, the more likely
|
||||
the agent is to fail).
|
||||
|
||||
```py
|
||||
agent.chat("Draw me a picture of the sea then transform the picture to add an island.")
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/sea_and_island.png" width=200>
|
||||
|
||||
<br/>
|
||||
|
||||
|
||||
Every [`~Agent.run`] operation is independent, so you can run it several times in a row with different tasks.
|
||||
|
||||
Note that your `agent` is just a large-language model, so small variations in your prompt might yield completely
|
||||
different results. It's important to explain as clearly as possible the task you want to perform.
|
||||
|
||||
If you'd like to keep a state across executions or to pass non-text objects to the agent, you can do so by specifying
|
||||
variables that you would like the agent to use. For example you could generate the first image of rivers and lakes,
|
||||
and ask the model to update that picture to add an island by doing the following:
|
||||
|
||||
```python
|
||||
picture = agent.run("Draw me a picture of rivers and lakes")
|
||||
updated_picture = agent.chat("Take that `picture` and add an island to it", picture=picture)
|
||||
```
|
||||
|
||||
<Tip>
|
||||
|
||||
This can be helpful when the model is unable to understand your request and mixes tools. An example would be:
|
||||
|
||||
```python
|
||||
agent.run("Draw me the picture of a capybara swimming in the sea")
|
||||
```
|
||||
|
||||
Here, the model could interpret it two ways:
|
||||
- Have the `text-to-image` generate a capybara swimming in the sea
|
||||
- Or, have the `text-to-image` generate capybara, then use the `image-transformation` tool to have it swim in the sea
|
||||
|
||||
In case you would like to force the first scenario, you could do so by passing it the prompt as an argument:
|
||||
|
||||
```python
|
||||
agent.run("Draw me a picture of the `prompt`", prompt="a capybara swimming in the sea")
|
||||
```
|
||||
|
||||
</Tip>
|
||||
|
||||
|
||||
### Chat-based execution (chat)
|
||||
|
||||
The agent also has a chat-based approach, using the [`~Agent.chat`] method:
|
||||
|
||||
```py
|
||||
agent.chat("Draw me a picture of rivers and lakes")
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes.png" width=200>
|
||||
|
||||
```py
|
||||
agent.chat("Transform the picture so that there is a rock in there")
|
||||
```
|
||||
|
||||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rivers_and_lakes_and_beaver.png" width=200>
|
||||
|
||||
<br/>
|
||||
|
||||
This is an interesting approach when you want to keep the state across instructions. It's better for experimentation,
|
||||
but will tend to be much better at single instructions rather than complex instructions (which the [`~Agent.run`]
|
||||
method is better at handling).
|
||||
|
||||
This method can also take arguments if you would like to pass non-text types or specific prompts.
|
||||
|
||||
### ⚠️ Remote execution
|
||||
|
||||
For demonstration purposes and so that this can be used with all setups, we have created remote executors for several
|
||||
of the default tools the agent has access to. These are created using
|
||||
[inference endpoints](https://huggingface.co/inference-endpoints). To see how to setup remote executors tools yourself,
|
||||
we recommend reading the custom tool guide [TODO LINK].
|
||||
|
||||
In order to run with remote tools, specifying `remote=True` to either [`~Agent.run`] or [`~Agent.chat`] is sufficient.
|
||||
|
||||
For example, the following command could be run on any device efficiently, without needing significant RAM or GPU:
|
||||
|
||||
```python
|
||||
agent.run("Draw me a picture of rivers and lakes", remote=True)
|
||||
```
|
||||
|
||||
The same can be said for [`~Agent.chat`]:
|
||||
|
||||
```py
|
||||
agent.chat("Draw me a picture of rivers and lakes", remote=True)
|
||||
```
|
||||
|
||||
### What's happening here? What are tools, and what are agents?
|
||||
|
||||
#### Agents
|
||||
|
||||
The "agent" here is a large language model, and we're prompting it so that it has access to a specific set of tools.
|
||||
|
||||
LLMs are pretty good at generating small samples of code, so this API takes advantage of that by prompting the
|
||||
LLM to give a small sample of code performing a task with a set of tools. This prompt is then completed by the
|
||||
task you give your agent and the description of the tools you give it. This way it gets access to the doc of the
|
||||
tools you are using, especially their expected inputs and outputs and can generate the relevant code.
|
||||
|
||||
#### Tools
|
||||
|
||||
Tools are very simple: they're a single function, with a name, and a description. We then use these tools description
|
||||
to prompt the agent. Through the prompt, we show the agent how it would leverage tools in order to perform what was
|
||||
requests in the query.
|
||||
|
||||
This is using brand-new tools and not pipelines, because the agent writes better code with very atomic tools.
|
||||
Pipelines are more refactored and often combine several tasks in one. Tools are really meant to be focused on
|
||||
one very simple task only.
|
||||
|
||||
#### Code-execution?!
|
||||
|
||||
This code is then executed with our small Python interpreter on the set of inputs passed along with your tools.
|
||||
We hear you screaming "Arbitrary code execution!" in the back, but let us explain why that is not the case.
|
||||
|
||||
The only functions that can be called are the tools you provided and the print function, so you're already
|
||||
limited in what can be executed. You should be safe if it's limited to Hugging Face tools.
|
||||
|
||||
Then, we don't allow any attribute lookup or imports (which shouldn't be needed anyway for passing along
|
||||
inputs/outputs to a small set of functions) so all the most obvious attacks (and you'd need to prompt the LLM
|
||||
to output them anyway) shouldn't be an issue. If you want to be on the super safe side, you can execute the
|
||||
run() method with the additional argument return_code=True, in which case the agent will just return the code
|
||||
to execute and you can decide whether to do it or not.
|
||||
|
||||
The execution will stop at any line trying to perform an illegal operation or if there is a regular Python error
|
||||
with the code generated by the agent.
|
||||
|
||||
### A curated set of tools
|
||||
|
||||
We identify a set of tools that can empower such agents. Here is an updated list of the tools we have integrated
|
||||
in `transformers`:
|
||||
|
||||
- **Document question answering**: given a document (such as a PDF) in image format, answer a question on this document ([Donut](../model_doc/donut))
|
||||
- **Text question answering**: given a long text and a question, answer the question in the text ([Flan-T5](../model_doc/flan-t5))
|
||||
- **Unconditional image captioning**: Caption the image! ([BLIP](../model_doc/blip))
|
||||
- **Image question answering**: given an image, answer a question on this image ([VILT](../model_doc/vilt))
|
||||
- **Image segmentation**: given an image and a prompt, output the segmentation mask of that prompt ([CLIPSeg](../model_doc/clipseg))
|
||||
- **Speech to text**: given an audio recording of a person talking, transcribe the speech into text ([Whisper](../model_doc/whisper))
|
||||
- **Text to speech**: convert text to speech ([SpeechT5](../model_doc/speecht5))
|
||||
- **Zero-shot text classification**: given a text and a list of labels, identify to which label the text corresponds the most ([BART](../model_doc/bart))
|
||||
- **Text summarization**: summarize a long text in one or a few sentences ([BART](../model_doc/bart))
|
||||
- **Translation**: translate the text into a given language ([NLLB](../model_doc/nllb))
|
||||
|
||||
These tools have an integration in transformers, and can be used manually as well, for example:
|
||||
|
||||
```py
|
||||
from transformers import load_tool
|
||||
|
||||
tool = load_tool("text-to-speech")
|
||||
audio = tool("This is a text to speech tool")
|
||||
```
|
||||
|
||||
### Custom tools
|
||||
|
||||
While we identify a curated set of tools, we strongly believe that the main value provided by this implementation is
|
||||
the ability to quickly create and share custom tools.
|
||||
|
||||
By pushing the code of a tool to a Hugging Face Space or a model repository, you're then able to leverage the tool
|
||||
directly with the agent. We've added a few
|
||||
**transformers-agnostic** tools to the `huggingface-tools` organization:
|
||||
|
||||
- **Text downloader**: to download a text from a web URL
|
||||
- **Text to image**: generate an image according to a prompt, leveraging stable diffusion
|
||||
- **Image transformation**: modify an image given an initial image and a prompt, leveraging instruct pix2pix stable diffusion
|
||||
|
||||
The text-to-image tool we have been using since the beginning is actually a remote tool that lives in
|
||||
[*huggingface-tools/text-to-image*](https://huggingface.co/spaces/huggingface-tools/text-to-image)! We will
|
||||
continue releasing such tools on this and other organization, to further supercharge this implementation.
|
||||
|
||||
The agents have by default access to tools that reside on `huggingface-tools`.
|
||||
We explain how to you can write and share your own tools as well as leverage any custom tool that resides on the Hub in [following guide](custom_tools).
|
||||
[following guide](custom_tools).
|
||||
|
||||
### Leveraging different agents
|
||||
|
||||
We showcase here how to use the [bigcode/starcoder](https://huggingface.co/bigcode/starcoder) model as an LLM, but
|
||||
it isn't the only model available. We also support the OpenAssistant model and OpenAI's davinci models (3.5 and 4).
|
||||
|
||||
We're planning on supporting local language models in an ulterior version.
|
||||
|
||||
The tools defined in this implementation are agnostic to the agent used; we are showcasing the agents that work with
|
||||
our prompts below, but the tools can also be used with Langchain, Minichain, or any other Agent-based library.
|
||||
|
||||
#### Example code for the OpenAssistant model
|
||||
|
||||
```py
|
||||
from transformers import HfAgent
|
||||
|
||||
agent = HfAgent(url_endpoint="https://OpenAssistant/oasst-sft-1-pythia-12b", token="<HF_TOKEN>")
|
||||
```
|
||||
|
||||
#### Example code for OpenAI models
|
||||
|
||||
```py
|
||||
from transformers import OpenAiAgent
|
||||
|
||||
agent = OpenAiAgent(model="text-davinci-003", api_key="<API_KEY>")
|
||||
```
|
||||
|
||||
### Code generation
|
||||
|
||||
So far we have shown how to use the agents to perform actions for you. However, the agent is really only generating code
|
||||
that we then execute using a very restricted Python interpreter. In case you would like to use the code generated in
|
||||
a different setting, the agent can be prompted to return the code, along with tool definition and accurate imports.
|
||||
|
||||
For example, the following instruction
|
||||
```python
|
||||
agent.run("Draw me a picture of rivers and lakes", return_code=True)
|
||||
```
|
||||
|
||||
returns the following code
|
||||
|
||||
```python
|
||||
from transformers import load_tool
|
||||
|
||||
image_generator = load_tool("huggingface-tools/text-to-image")
|
||||
|
||||
image = image_generator(prompt="rivers and lakes")
|
||||
```
|
||||
|
||||
that you can then modify and execute yourself.
|
||||
Reference in New Issue
Block a user