Create local Transformers Engine (#33218)
* Create local Transformers Engine
This commit is contained in:
@@ -126,12 +126,13 @@ Additionally, `llm_engine` can also take a `grammar` argument. In the case where
|
||||
|
||||
You will also need a `tools` argument which accepts a list of `Tools` - it can be an empty list. You can also add the default toolbox on top of your `tools` list by defining the optional argument `add_base_tools=True`.
|
||||
|
||||
Now you can create an agent, like [`CodeAgent`], and run it. For convenience, we also provide the [`HfEngine`] class that uses `huggingface_hub.InferenceClient` under the hood.
|
||||
Now you can create an agent, like [`CodeAgent`], and run it. You can also create a [`TransformersEngine`] with a pre-initialized pipeline to run inference on your local machine using `transformers`.
|
||||
For convenience, since agentic behaviours generally require stronger models such as `Llama-3.1-70B-Instruct` that are harder to run locally for now, we also provide the [`HfApiEngine`] class that initializes a `huggingface_hub.InferenceClient` under the hood.
|
||||
|
||||
```python
|
||||
from transformers import CodeAgent, HfEngine
|
||||
from transformers import CodeAgent, HfApiEngine
|
||||
|
||||
llm_engine = HfEngine(model="meta-llama/Meta-Llama-3-70B-Instruct")
|
||||
llm_engine = HfApiEngine(model="meta-llama/Meta-Llama-3-70B-Instruct")
|
||||
agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)
|
||||
|
||||
agent.run(
|
||||
@@ -141,7 +142,7 @@ agent.run(
|
||||
```
|
||||
|
||||
This will be handy in case of emergency baguette need!
|
||||
You can even leave the argument `llm_engine` undefined, and an [`HfEngine`] will be created by default.
|
||||
You can even leave the argument `llm_engine` undefined, and an [`HfApiEngine`] will be created by default.
|
||||
|
||||
```python
|
||||
from transformers import CodeAgent
|
||||
@@ -521,14 +522,14 @@ import gradio as gr
|
||||
from transformers import (
|
||||
load_tool,
|
||||
ReactCodeAgent,
|
||||
HfEngine,
|
||||
HfApiEngine,
|
||||
stream_to_gradio,
|
||||
)
|
||||
|
||||
# Import tool from Hub
|
||||
image_generation_tool = load_tool("m-ric/text-to-image")
|
||||
|
||||
llm_engine = HfEngine("meta-llama/Meta-Llama-3-70B-Instruct")
|
||||
llm_engine = HfApiEngine("meta-llama/Meta-Llama-3-70B-Instruct")
|
||||
|
||||
# Initialize the agent with the image generation tool
|
||||
agent = ReactCodeAgent(tools=[image_generation_tool], llm_engine=llm_engine)
|
||||
|
||||
@@ -87,12 +87,33 @@ These engines have the following specification:
|
||||
1. Follow the [messages format](../chat_templating.md) for its input (`List[Dict[str, str]]`) and return a string.
|
||||
2. Stop generating outputs *before* the sequences passed in the argument `stop_sequences`
|
||||
|
||||
### HfEngine
|
||||
### TransformersEngine
|
||||
|
||||
For convenience, we have added a `HfEngine` that implements the points above and uses an inference endpoint for the execution of the LLM.
|
||||
For convenience, we have added a `TransformersEngine` that implements the points above, taking a pre-initialized `Pipeline` as input.
|
||||
|
||||
```python
|
||||
>>> from transformers import HfEngine
|
||||
>>> from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, TransformersEngine
|
||||
|
||||
>>> model_name = "HuggingFaceTB/SmolLM-135M-Instruct"
|
||||
>>> tokenizer = AutoTokenizer.from_pretrained(model_name)
|
||||
>>> model = AutoModelForCausalLM.from_pretrained(model_name)
|
||||
|
||||
>>> pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
|
||||
|
||||
>>> engine = TransformersEngine(pipe)
|
||||
>>> engine([{"role": "user", "content": "Ok!"}], stop_sequences=["great"])
|
||||
|
||||
"What a "
|
||||
```
|
||||
|
||||
[[autodoc]] TransformersEngine
|
||||
|
||||
### HfApiEngine
|
||||
|
||||
The `HfApiEngine` is an engine that wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM.
|
||||
|
||||
```python
|
||||
>>> from transformers import HfApiEngine
|
||||
|
||||
>>> messages = [
|
||||
... {"role": "user", "content": "Hello, how are you?"},
|
||||
@@ -100,12 +121,12 @@ For convenience, we have added a `HfEngine` that implements the points above and
|
||||
... {"role": "user", "content": "No need to help, take it easy."},
|
||||
... ]
|
||||
|
||||
>>> HfEngine()(messages, stop_sequences=["conversation"])
|
||||
>>> HfApiEngine()(messages, stop_sequences=["conversation"])
|
||||
|
||||
"That's very kind of you to say! It's always nice to have a relaxed "
|
||||
```
|
||||
|
||||
[[autodoc]] HfEngine
|
||||
[[autodoc]] HfApiEngine
|
||||
|
||||
|
||||
## Agent Types
|
||||
|
||||
@@ -83,12 +83,12 @@ API나 기반 모델이 자주 업데이트되므로, 에이전트가 제공하
|
||||
1. 입력(`List[Dict[str, str]]`)에 대한 [메시지 형식](../chat_templating.md)을 따르고 문자열을 반환해야 합니다.
|
||||
2. 인수 `stop_sequences`에 시퀀스가 전달되기 *전에* 출력을 생성하는 것을 중지해야 합니다.
|
||||
|
||||
### HfEngine [[hfengine]]
|
||||
### HfApiEngine [[HfApiEngine]]
|
||||
|
||||
편의를 위해, 위의 사항을 구현하고 대규모 언어 모델 실행을 위해 추론 엔드포인트를 사용하는 `HfEngine`을 추가했습니다.
|
||||
편의를 위해, 위의 사항을 구현하고 대규모 언어 모델 실행을 위해 추론 엔드포인트를 사용하는 `HfApiEngine`을 추가했습니다.
|
||||
|
||||
```python
|
||||
>>> from transformers import HfEngine
|
||||
>>> from transformers import HfApiEngine
|
||||
|
||||
>>> messages = [
|
||||
... {"role": "user", "content": "Hello, how are you?"},
|
||||
@@ -96,12 +96,12 @@ API나 기반 모델이 자주 업데이트되므로, 에이전트가 제공하
|
||||
... {"role": "user", "content": "No need to help, take it easy."},
|
||||
... ]
|
||||
|
||||
>>> HfEngine()(messages, stop_sequences=["conversation"])
|
||||
>>> HfApiEngine()(messages, stop_sequences=["conversation"])
|
||||
|
||||
"That's very kind of you to say! It's always nice to have a relaxed "
|
||||
```
|
||||
|
||||
[[autodoc]] HfEngine
|
||||
[[autodoc]] HfApiEngine
|
||||
|
||||
|
||||
## 에이전트 유형 [[agent-types]]
|
||||
|
||||
Reference in New Issue
Block a user