diff --git a/docs/source/zh/_config.py b/docs/source/zh/_config.py new file mode 100644 index 000000000..81f6de049 --- /dev/null +++ b/docs/source/zh/_config.py @@ -0,0 +1,14 @@ +# docstyle-ignore +INSTALL_CONTENT = """ +# Installation +! pip install smolagents +# To install from source instead of the last release, comment the command above and uncomment the following one. +# ! pip install git+https://github.com/huggingface/smolagents.git +""" + +notebook_first_cells = [{"type": "code", "content": INSTALL_CONTENT}] +black_avoid_patterns = { + "{processor_class}": "FakeProcessorClass", + "{model_class}": "FakeModelClass", + "{object_class}": "FakeObjectClass", +} diff --git a/docs/source/zh/_toctree.yml b/docs/source/zh/_toctree.yml new file mode 100644 index 000000000..4da8f4859 --- /dev/null +++ b/docs/source/zh/_toctree.yml @@ -0,0 +1,34 @@ +- title: 起步 + sections: + - local: index + title: 🤗 Agents + - local: guided_tour + title: 导览 +- title: Tutorials + sections: + - local: tutorials/building_good_agents + title: ✨ 构建好用的 agents + - local: tutorials/tools + title: 🛠️ 工具 - 深度指南 + - local: tutorials/secure_code_execution + title: 🛡️ 使用 E2B 保护你的代码执行 +- title: Conceptual guides + sections: + - local: conceptual_guides/intro_agents + title: 🤖 Agent 化系统介绍 + - local: conceptual_guides/react + title: 🤔 多步骤 Agent 是如何工作的? +- title: Examples + sections: + - local: examples/text_to_sql + title: Self-correcting Text-to-SQL + - local: examples/rag + title: Master you knowledge base with agentic RAG + - local: examples/multiagents + title: Orchestrate a multi-agent system +- title: Reference + sections: + - local: reference/agents + title: Agent-related objects + - local: reference/tools + title: Tool-related objects diff --git a/docs/source/zh/conceptual_guides/intro_agents.md b/docs/source/zh/conceptual_guides/intro_agents.md new file mode 100644 index 000000000..416aabcb5 --- /dev/null +++ b/docs/source/zh/conceptual_guides/intro_agents.md @@ -0,0 +1,122 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> + +# Agent 简介 + +> [!TIP] +> 译者注:Agent 的业内术语是“智能体”。本译文将保留 agent,不作翻译,以带来更高效的阅读体验。(在中文为主的文章中,It's easier to 注意到英文。Attention Is All You Need!) + +## 🤔 什么是 agent? + +任何使用 AI 的高效系统都需要为 LLM 提供某种访问现实世界的方式:例如调用搜索工具获取外部信息,或者操作某些程序以完成任务。换句话说,LLM 应该具有 **_Agent 能力_**。Agent 程序是 LLM 通往外部世界的门户。 + +> [!TIP] +> AI agent 是 **LLM 输出控制工作流的程序**。 + +任何利用 LLM 的系统都会将 LLM 输出集成到代码中。LLM 输入对代码工作流的影响程度就是 LLM 在系统中的 agent 能力级别。 + +请注意,根据这个定义,"Agent" 不是一个离散的、非 0 即 1 的定义:相反,"Agent 能力" 是一个连续谱系,随着你在工作流中给予 LLM 更多或更少的权力而变化。 + +请参见下表中 agent 能力在不同系统中的变化: + +| Agent 能力级别 | 描述 | 名称 | 示例模式 | +| ------------ | ---------------------------------------------- | ---------- | -------------------------------------------------- | +| ☆☆☆ | LLM 输出对程序流程没有影响 | 简单处理器 | `process_llm_output(llm_response)` | +| ★☆☆ | LLM 输出决定 if/else 分支 | 路由 | `if llm_decision(): path_a() else: path_b()` | +| ★★☆ | LLM 输出决定函数执行 | 工具调用者 | `run_function(llm_chosen_tool, llm_chosen_args)` | +| ★★★ | LLM 输出控制迭代和程序继续 | 多步 Agent | `while llm_should_continue(): execute_next_step()` | +| ★★★ | 一个 agent 工作流可以启动另一个 agent 工作流 | 多 Agent | `if llm_trigger(): execute_agent()` | + +多步 agent 具有以下代码结构: + +```python +memory = [user_defined_task] +while llm_should_continue(memory): # 这个循环是多步部分 + action = llm_get_next_action(memory) # 这是工具调用部分 + observations = execute_action(action) + memory += [action, observations] +``` + +这个 agent 系统在一个循环中运行,每一步执行一个新动作(该动作可能涉及调用一些预定义的 *工具*,这些工具只是函数),直到其观察结果表明已达到解决给定任务的满意状态。以下是一个多步 agent 如何解决简单数学问题的示例: + +<div class="flex justify-center"> + <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif"/> +</div> + +## ✅ 何时使用 agent / ⛔ 何时避免使用 + +当你需要 LLM 确定应用程序的工作流时,agent 很有用。但它们通常有些过度。问题是:我真的需要工作流的灵活性来有效解决手头的任务吗? +如果预定义的工作流经常不足,这意味着你需要更多的灵活性。 +让我们举个例子:假设你正在开发一个处理冲浪旅行网站客户请求的应用程序。 + +你可以提前知道请求将属于 2 个类别之一(基于用户选择),并且你为这 2 种情况都有预定义的工作流。 + +1. 想要了解旅行信息?⇒ 给他们访问搜索栏以搜索你的知识库 +2. 想与销售交谈?⇒ 让他们填写联系表单。 + +如果这个确定性工作流适合所有查询,那就直接编码吧!这将为你提供一个 100% 可靠的系统,没有让不可预测的 LLM 干扰你的工作流而引入错误的风险。为了简单和稳健起见,建议规范化不使用任何 agent 行为。 + +但如果工作流不能提前确定得那么好呢? + +例如,用户想问:`"I can come on Monday, but I forgot my passport so risk being delayed to Wednesday, is it possible to take me and my stuff to surf on Tuesday morning, with a cancellation insurance?"` 这个问题涉及许多因素,可能上述预定的标准都不足以满足这个请求。 + +如果预定义的工作流经常不足,这意味着你需要更多的灵活性。 + +这就是 agent 设置发挥作用的地方。 + +在上面的例子中,你可以创建一个多步 agent,它可以访问天气 API 获取天气预报,Google Maps API 计算旅行距离,员工在线仪表板和你的知识库上的 RAG 系统。 + +直到最近,计算机程序还局限于预定义的工作流,试图通过堆积 if/else 分支来处理复杂性。它们专注于极其狭窄的任务,如"计算这些数字的总和"或"找到这个图中的最短路径"。但实际上,大多数现实生活中的任务,如我们上面的旅行示例,都不适合预定义的工作流。agent 系统为程序打开了现实世界任务的大门! + +## 为什么选择 `smolagents`? + +对于一些低级的 agent 用例,如链或路由器,你可以自己编写所有代码。这样会更好,因为它可以让你更好地控制和理解你的系统。 + +但一旦你开始追求更复杂的行为,比如让 LLM 调用函数(即"工具调用")或让 LLM 运行 while 循环("多步 agent"),一些抽象就变得必要: + +- 对于工具调用,你需要解析 agent 的输出,因此这个输出需要一个预定义的格式,如"Thought: I should call tool 'get_weather'. Action: get_weather(Paris).",你用预定义的函数解析它,并且给 LLM 的系统提示应该通知它这个格式。 +- 对于 LLM 输出决定循环的多步 agent,你需要根据上次循环迭代中发生的情况给 LLM 不同的提示:所以你需要某种记忆能力。 + +看到了吗?通过这两个例子,我们已经发现需要一些项目来帮助我们: + +- 当然,一个作为系统引擎的 LLM +- agent 可以访问的工具列表 +- 从 LLM 输出中提取工具调用的解析器 +- 与解析器同步的系统提示 +- 记忆能力 + +但是等等,既然我们给 LLM 在决策中留出了空间,它们肯定会犯错误:所以我们需要错误日志记录和重试机制。 + +所有这些元素都需要紧密耦合才能形成一个功能良好的系统。这就是为什么我们决定需要制作基本构建块来让所有这些东西协同工作。 + +## 代码 agent + +在多步 agent 中,每一步 LLM 都可以编写一个动作,形式为调用外部工具。编写这些动作的常见格式(由 Anthropic、OpenAI 等使用)通常是"将动作编写为工具名称和要使用的参数的 JSON,然后解析以知道要执行哪个工具以及使用哪些参数"的不同变体。 + +[多项](https://huggingface.co/papers/2402.01030) [研究](https://huggingface.co/papers/2411.01747) [论文](https://huggingface.co/papers/2401.00812) 表明,在代码中进行工具调用的 LLM 要好得多。 + +原因很简单,_我们专门设计了我们的代码语言,使其成为表达计算机执行动作的最佳方式_。如果 JSON 片段是更好的表达方式,JSON 将成为顶级编程语言,编程将变得非常困难。 + +下图取自 [Executable Code Actions Elicit Better LLM Agents](https://huggingface.co/papers/2402.01030),说明了用代码编写动作的一些优势: + +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/code_vs_json_actions.png"> + +与 JSON 片段相比,用代码编写动作提供了更好的: + +- **可组合性:** 你能像定义 python 函数一样,将 JSON 动作嵌套在一起,或定义一组 JSON 动作以供重用吗? +- **对象管理:** 你如何在 JSON 中存储像 `generate_image` 这样的动作的输出? +- **通用性:** 代码被构建为简单地表达任何你可以让计算机做的事情。 +- **LLM 训练数据中的表示:** 大量高质量的代码动作已经包含在 LLM 的训练数据中,这意味着它们已经为此进行了训练! diff --git a/docs/source/zh/conceptual_guides/react.md b/docs/source/zh/conceptual_guides/react.md new file mode 100644 index 000000000..24428e03f --- /dev/null +++ b/docs/source/zh/conceptual_guides/react.md @@ -0,0 +1,47 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# 多步骤 agent 是如何工作的? + +ReAct 框架([Yao et al., 2022](https://huggingface.co/papers/2210.03629))是目前构建 agent 的主要方法。 + +该名称基于两个词的组合:"Reason" (推理)和 "Act" (行动)。实际上,遵循此架构的 agent 将根据需要尽可能多的步骤来解决其任务,每个步骤包括一个推理步骤,然后是一个行动步骤,在该步骤中,它制定工具调用,使其更接近解决手头的任务。 + +ReAct 过程涉及保留过去步骤的记忆。 + +> [!TIP] +> 阅读 [Open-source LLMs as LangChain Agents](https://huggingface.co/blog/open-source-llms-as-agents) 博客文章以了解更多关于多步 agent 的信息。 + +以下是其工作原理的视频概述: + +<div class="flex justify-center"> + <img + class="block dark:hidden" + src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif" + /> + <img + class="hidden dark:block" + src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/Agent_ManimCE.gif" + /> +</div> + + + +我们实现了两个版本的 ToolCallingAgent: +- [`ToolCallingAgent`] 在其输出中生成 JSON 格式的工具调用。 +- [`CodeAgent`] 是一种新型的 ToolCallingAgent,它生成代码块形式的工具调用,这对于具有强大编码性能的 LLM 非常有效。 + +> [!TIP] +> 我们还提供了一个选项来以单步模式运行 agent:只需在启动 agent 时传递 `single_step=True`,例如 `agent.run(your_task, single_step=True)` \ No newline at end of file diff --git a/docs/source/zh/examples/multiagents.md b/docs/source/zh/examples/multiagents.md new file mode 100644 index 000000000..4ea4e51b2 --- /dev/null +++ b/docs/source/zh/examples/multiagents.md @@ -0,0 +1,199 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# Orchestrate a multi-agent system 🤖🤝🤖 + +[[open-in-colab]] + +In this notebook we will make a **multi-agent web browser: an agentic system with several agents collaborating to solve problems using the web!** + +It will be a simple hierarchy, using a `ManagedAgent` object to wrap the managed web search agent: + +``` + +----------------+ + | Manager agent | + +----------------+ + | + _______________|______________ + | | + Code interpreter +--------------------------------+ + tool | Managed agent | + | +------------------+ | + | | Web Search agent | | + | +------------------+ | + | | | | + | Web Search tool | | + | Visit webpage tool | + +--------------------------------+ +``` +Let's set up this system. + +Run the line below to install the required dependencies: + +``` +!pip install markdownify duckduckgo-search smolagents --upgrade -q +``` + +Let's login in order to call the HF Inference API: + +```py +from huggingface_hub import notebook_login + +notebook_login() +``` + +⚡️ Our agent will be powered by [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct) using `HfApiModel` class that uses HF's Inference API: the Inference API allows to quickly and easily run any OS model. + +_Note:_ The Inference API hosts models based on various criteria, and deployed models may be updated or replaced without prior notice. Learn more about it [here](https://huggingface.co/docs/api-inference/supported-models). + +```py +model_id = "Qwen/Qwen2.5-Coder-32B-Instruct" +``` + +## 🔍 Create a web search tool + +For web browsing, we can already use our pre-existing [`DuckDuckGoSearchTool`](https://github.com/huggingface/smolagents/blob/main/src/smolagents/default_tools/search.py) tool to provide a Google search equivalent. + +But then we will also need to be able to peak into the page found by the `DuckDuckGoSearchTool`. +To do so, we could import the library's built-in `VisitWebpageTool`, but we will build it again to see how it's done. + +So let's create our `VisitWebpageTool` tool from scratch using `markdownify`. + +```py +import re +import requests +from markdownify import markdownify +from requests.exceptions import RequestException +from smolagents import tool + + +@tool +def visit_webpage(url: str) -> str: + """Visits a webpage at the given URL and returns its content as a markdown string. + + Args: + url: The URL of the webpage to visit. + + Returns: + The content of the webpage converted to Markdown, or an error message if the request fails. + """ + try: + # Send a GET request to the URL + response = requests.get(url) + response.raise_for_status() # Raise an exception for bad status codes + + # Convert the HTML content to Markdown + markdown_content = markdownify(response.text).strip() + + # Remove multiple line breaks + markdown_content = re.sub(r"\n{3,}", "\n\n", markdown_content) + + return markdown_content + + except RequestException as e: + return f"Error fetching the webpage: {str(e)}" + except Exception as e: + return f"An unexpected error occurred: {str(e)}" +``` + +Ok, now let's initialize and test our tool! + +```py +print(visit_webpage("https://en.wikipedia.org/wiki/Hugging_Face")[:500]) +``` + +## Build our multi-agent system 🤖🤝🤖 + +Now that we have all the tools `search` and `visit_webpage`, we can use them to create the web agent. + +Which configuration to choose for this agent? +- Web browsing is a single-timeline task that does not require parallel tool calls, so JSON tool calling works well for that. We thus choose a `JsonAgent`. +- Also, since sometimes web search requires exploring many pages before finding the correct answer, we prefer to increase the number of `max_steps` to 10. + +```py +from smolagents import ( + CodeAgent, + ToolCallingAgent, + HfApiModel, + ManagedAgent, + DuckDuckGoSearchTool, + LiteLLMModel, +) + +model = HfApiModel(model_id) + +web_agent = ToolCallingAgent( + tools=[DuckDuckGoSearchTool(), visit_webpage], + model=model, + max_steps=10, +) +``` + +We then wrap this agent into a `ManagedAgent` that will make it callable by its manager agent. + +```py +managed_web_agent = ManagedAgent( + agent=web_agent, + name="search", + description="Runs web searches for you. Give it your query as an argument.", +) +``` + +Finally we create a manager agent, and upon initialization we pass our managed agent to it in its `managed_agents` argument. + +Since this agent is the one tasked with the planning and thinking, advanced reasoning will be beneficial, so a `CodeAgent` will be the best choice. + +Also, we want to ask a question that involves the current year and does additional data calculations: so let us add `additional_authorized_imports=["time", "numpy", "pandas"]`, just in case the agent needs these packages. + +```py +manager_agent = CodeAgent( + tools=[], + model=model, + managed_agents=[managed_web_agent], + additional_authorized_imports=["time", "numpy", "pandas"], +) +``` + +That's all! Now let's run our system! We select a question that requires both some calculation and research: + +```py +answer = manager_agent.run("If LLM training continues to scale up at the current rhythm until 2030, what would be the electric power in GW required to power the biggest training runs by 2030? What would that correspond to, compared to some countries? Please provide a source for any numbers used.") +``` + +We get this report as the answer: +``` +Based on current growth projections and energy consumption estimates, if LLM trainings continue to scale up at the +current rhythm until 2030: + +1. The electric power required to power the biggest training runs by 2030 would be approximately 303.74 GW, which +translates to about 2,660,762 GWh/year. + +2. Comparing this to countries' electricity consumption: + - It would be equivalent to about 34% of China's total electricity consumption. + - It would exceed the total electricity consumption of India (184%), Russia (267%), and Japan (291%). + - It would be nearly 9 times the electricity consumption of countries like Italy or Mexico. + +3. Source of numbers: + - The initial estimate of 5 GW for future LLM training comes from AWS CEO Matt Garman. + - The growth projection used a CAGR of 79.80% from market research by Springs. + - Country electricity consumption data is from the U.S. Energy Information Administration, primarily for the year +2021. +``` + +Seems like we'll need some sizeable powerplants if the [scaling hypothesis](https://gwern.net/scaling-hypothesis) continues to hold true. + +Our agents managed to efficiently collaborate towards solving the task! ✅ + +💡 You can easily extend this orchestration to more agents: one does the code execution, one the web search, one handles file loadings... \ No newline at end of file diff --git a/docs/source/zh/examples/rag.md b/docs/source/zh/examples/rag.md new file mode 100644 index 000000000..acbdf14f6 --- /dev/null +++ b/docs/source/zh/examples/rag.md @@ -0,0 +1,156 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# Agentic RAG + +[[open-in-colab]] + +Retrieval-Augmented-Generation (RAG) is “using an LLM to answer a user query, but basing the answer on information retrieved from a knowledge base”. It has many advantages over using a vanilla or fine-tuned LLM: to name a few, it allows to ground the answer on true facts and reduce confabulations, it allows to provide the LLM with domain-specific knowledge, and it allows fine-grained control of access to information from the knowledge base. + +But vanilla RAG has limitations, most importantly these two: +- It performs only one retrieval step: if the results are bad, the generation in turn will be bad. +- Semantic similarity is computed with the user query as a reference, which might be suboptimal: for instance, the user query will often be a question and the document containing the true answer will be in affirmative voice, so its similarity score will be downgraded compared to other source documents in the interrogative form, leading to a risk of missing the relevant information. + +We can alleviate these problems by making a RAG agent: very simply, an agent armed with a retriever tool! + +This agent will: ✅ Formulate the query itself and ✅ Critique to re-retrieve if needed. + +So it should naively recover some advanced RAG techniques! +- Instead of directly using the user query as the reference in semantic search, the agent formulates itself a reference sentence that can be closer to the targeted documents, as in [HyDE](https://huggingface.co/papers/2212.10496). +The agent can use the generated snippets and re-retrieve if needed, as in [Self-Query](https://docs.llamaindex.ai/en/stable/examples/evaluation/RetryQuery/). + +Let's build this system. 🛠️ + +Run the line below to install required dependencies: +```bash +!pip install smolagents pandas langchain langchain-community sentence-transformers rank_bm25 --upgrade -q +``` +To call the HF Inference API, you will need a valid token as your environment variable `HF_TOKEN`. +We use python-dotenv to load it. +```py +from dotenv import load_dotenv +load_dotenv() +``` + +We first load a knowledge base on which we want to perform RAG: this dataset is a compilation of the documentation pages for many Hugging Face libraries, stored as markdown. We will keep only the documentation for the `transformers` library. + +Then prepare the knowledge base by processing the dataset and storing it into a vector database to be used by the retriever. + +We use [LangChain](https://python.langchain.com/docs/introduction/) for its excellent vector database utilities. + +```py +import datasets +from langchain.docstore.document import Document +from langchain.text_splitter import RecursiveCharacterTextSplitter +from langchain_community.retrievers import BM25Retriever + +knowledge_base = datasets.load_dataset("m-ric/huggingface_doc", split="train") +knowledge_base = knowledge_base.filter(lambda row: row["source"].startswith("huggingface/transformers")) + +source_docs = [ + Document(page_content=doc["text"], metadata={"source": doc["source"].split("/")[1]}) + for doc in knowledge_base +] + +text_splitter = RecursiveCharacterTextSplitter( + chunk_size=500, + chunk_overlap=50, + add_start_index=True, + strip_whitespace=True, + separators=["\n\n", "\n", ".", " ", ""], +) +docs_processed = text_splitter.split_documents(source_docs) +``` + +Now the documents are ready. + +So let’s build our agentic RAG system! + +👉 We only need a RetrieverTool that our agent can leverage to retrieve information from the knowledge base. + +Since we need to add a vectordb as an attribute of the tool, we cannot simply use the simple tool constructor with a `@tool` decorator: so we will follow the advanced setup highlighted in the [tools tutorial](../tutorials/tools). + +```py +from smolagents import Tool + +class RetrieverTool(Tool): + name = "retriever" + description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query." + inputs = { + "query": { + "type": "string", + "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.", + } + } + output_type = "string" + + def __init__(self, docs, **kwargs): + super().__init__(**kwargs) + self.retriever = BM25Retriever.from_documents( + docs, k=10 + ) + + def forward(self, query: str) -> str: + assert isinstance(query, str), "Your search query must be a string" + + docs = self.retriever.invoke( + query, + ) + return "\nRetrieved documents:\n" + "".join( + [ + f"\n\n===== Document {str(i)} =====\n" + doc.page_content + for i, doc in enumerate(docs) + ] + ) + +retriever_tool = RetrieverTool(docs_processed) +``` +We have used BM25, a classic retrieval method, because it's lightning fast to setup. +To improve retrieval accuracy, you could use replace BM25 with semantic search using vector representations for documents: thus you can head to the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) to select a good embedding model. + +Now it’s straightforward to create an agent that leverages this `retriever_tool`! + +The agent will need these arguments upon initialization: +- `tools`: a list of tools that the agent will be able to call. +- `model`: the LLM that powers the agent. +Our `model` must be a callable that takes as input a list of messages and returns text. It also needs to accept a stop_sequences argument that indicates when to stop its generation. For convenience, we directly use the HfEngine class provided in the package to get a LLM engine that calls Hugging Face's Inference API. + +And we use [meta-llama/Llama-3.3-70B-Instruct](meta-llama/Llama-3.3-70B-Instruct) as the llm engine because: +- It has a long 128k context, which is helpful for processing long source documents +- It is served for free at all times on HF's Inference API! + +_Note:_ The Inference API hosts models based on various criteria, and deployed models may be updated or replaced without prior notice. Learn more about it [here](https://huggingface.co/docs/api-inference/supported-models). + +```py +from smolagents import HfApiModel, CodeAgent + +agent = CodeAgent( + tools=[retriever_tool], model=HfApiModel("meta-llama/Llama-3.3-70B-Instruct"), max_steps=4, verbose=True +) +``` + +Upon initializing the CodeAgent, it has been automatically given a default system prompt that tells the LLM engine to process step-by-step and generate tool calls as code snippets, but you could replace this prompt template with your own as needed. + +Then when its `.run()` method is launched, the agent takes care of calling the LLM engine, and executing the tool calls, all in a loop that ends only when tool `final_answer` is called with the final answer as its argument. + +```py +agent_output = agent.run("For a transformers model training, which is slower, the forward or the backward pass?") + +print("Final output:") +print(agent_output) +``` + + + diff --git a/docs/source/zh/examples/text_to_sql.md b/docs/source/zh/examples/text_to_sql.md new file mode 100644 index 000000000..12d0c5e47 --- /dev/null +++ b/docs/source/zh/examples/text_to_sql.md @@ -0,0 +1,202 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# Text-to-SQL + +[[open-in-colab]] + +In this tutorial, we’ll see how to implement an agent that leverages SQL using `smolagents`. + +> Let's start with the golden question: why not keep it simple and use a standard text-to-SQL pipeline? + +A standard text-to-sql pipeline is brittle, since the generated SQL query can be incorrect. Even worse, the query could be incorrect, but not raise an error, instead giving some incorrect/useless outputs without raising an alarm. + +👉 Instead, an agent system is able to critically inspect outputs and decide if the query needs to be changed or not, thus giving it a huge performance boost. + +Let’s build this agent! 💪 + +First, we setup the SQL environment: +```py +from sqlalchemy import ( + create_engine, + MetaData, + Table, + Column, + String, + Integer, + Float, + insert, + inspect, + text, +) + +engine = create_engine("sqlite:///:memory:") +metadata_obj = MetaData() + +# create city SQL table +table_name = "receipts" +receipts = Table( + table_name, + metadata_obj, + Column("receipt_id", Integer, primary_key=True), + Column("customer_name", String(16), primary_key=True), + Column("price", Float), + Column("tip", Float), +) +metadata_obj.create_all(engine) + +rows = [ + {"receipt_id": 1, "customer_name": "Alan Payne", "price": 12.06, "tip": 1.20}, + {"receipt_id": 2, "customer_name": "Alex Mason", "price": 23.86, "tip": 0.24}, + {"receipt_id": 3, "customer_name": "Woodrow Wilson", "price": 53.43, "tip": 5.43}, + {"receipt_id": 4, "customer_name": "Margaret James", "price": 21.11, "tip": 1.00}, +] +for row in rows: + stmt = insert(receipts).values(**row) + with engine.begin() as connection: + cursor = connection.execute(stmt) +``` + +### Build our agent + +Now let’s make our SQL table retrievable by a tool. + +The tool’s description attribute will be embedded in the LLM’s prompt by the agent system: it gives the LLM information about how to use the tool. This is where we want to describe the SQL table. + +```py +inspector = inspect(engine) +columns_info = [(col["name"], col["type"]) for col in inspector.get_columns("receipts")] + +table_description = "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info]) +print(table_description) +``` + +```text +Columns: + - receipt_id: INTEGER + - customer_name: VARCHAR(16) + - price: FLOAT + - tip: FLOAT +``` + +Now let’s build our tool. It needs the following: (read [the tool doc](../tutorials/tools) for more detail) +- A docstring with an `Args:` part listing arguments. +- Type hints on both inputs and output. + +```py +from smolagents import tool + +@tool +def sql_engine(query: str) -> str: + """ + Allows you to perform SQL queries on the table. Returns a string representation of the result. + The table is named 'receipts'. Its description is as follows: + Columns: + - receipt_id: INTEGER + - customer_name: VARCHAR(16) + - price: FLOAT + - tip: FLOAT + + Args: + query: The query to perform. This should be correct SQL. + """ + output = "" + with engine.connect() as con: + rows = con.execute(text(query)) + for row in rows: + output += "\n" + str(row) + return output +``` + +Now let us create an agent that leverages this tool. + +We use the `CodeAgent`, which is smolagents’ main agent class: an agent that writes actions in code and can iterate on previous output according to the ReAct framework. + +The model is the LLM that powers the agent system. HfApiModel allows you to call LLMs using HF’s Inference API, either via Serverless or Dedicated endpoint, but you could also use any proprietary API. + +```py +from smolagents import CodeAgent, HfApiModel + +agent = CodeAgent( + tools=[sql_engine], + model=HfApiModel("meta-llama/Meta-Llama-3.1-8B-Instruct"), +) +agent.run("Can you give me the name of the client who got the most expensive receipt?") +``` + +### Level 2: Table joins + +Now let’s make it more challenging! We want our agent to handle joins across multiple tables. + +So let’s make a second table recording the names of waiters for each receipt_id! + +```py +table_name = "waiters" +receipts = Table( + table_name, + metadata_obj, + Column("receipt_id", Integer, primary_key=True), + Column("waiter_name", String(16), primary_key=True), +) +metadata_obj.create_all(engine) + +rows = [ + {"receipt_id": 1, "waiter_name": "Corey Johnson"}, + {"receipt_id": 2, "waiter_name": "Michael Watts"}, + {"receipt_id": 3, "waiter_name": "Michael Watts"}, + {"receipt_id": 4, "waiter_name": "Margaret James"}, +] +for row in rows: + stmt = insert(receipts).values(**row) + with engine.begin() as connection: + cursor = connection.execute(stmt) +``` +Since we changed the table, we update the `SQLExecutorTool` with this table’s description to let the LLM properly leverage information from this table. + +```py +updated_description = """Allows you to perform SQL queries on the table. Beware that this tool's output is a string representation of the execution output. +It can use the following tables:""" + +inspector = inspect(engine) +for table in ["receipts", "waiters"]: + columns_info = [(col["name"], col["type"]) for col in inspector.get_columns(table)] + + table_description = f"Table '{table}':\n" + + table_description += "Columns:\n" + "\n".join([f" - {name}: {col_type}" for name, col_type in columns_info]) + updated_description += "\n\n" + table_description + +print(updated_description) +``` +Since this request is a bit harder than the previous one, we’ll switch the LLM engine to use the more powerful [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)! + +```py +sql_engine.description = updated_description + +agent = CodeAgent( + tools=[sql_engine], + model=HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct"), +) + +agent.run("Which waiter got more total money from tips?") +``` +It directly works! The setup was surprisingly simple, wasn’t it? + +This example is done! We've touched upon these concepts: +- Building new tools. +- Updating a tool's description. +- Switching to a stronger LLM helps agent reasoning. + +✅ Now you can go build this text-to-SQL system you’ve always dreamt of! ✨ \ No newline at end of file diff --git a/docs/source/zh/guided_tour.md b/docs/source/zh/guided_tour.md new file mode 100644 index 000000000..07988fee0 --- /dev/null +++ b/docs/source/zh/guided_tour.md @@ -0,0 +1,366 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# Agents - 导览 + +[[open-in-colab]] + +在本导览中,您将学习如何构建一个 agent(智能体),如何运行它,以及如何自定义它以使其更好地适应您的使用场景。 + +> [!TIP] +> 译者注:Agent 的业内术语是“智能体”。本译文将保留 agent,不作翻译,以带来更高效的阅读体验。(在中文为主的文章中,It's easier to 注意到英文。Attention Is All You Need!) + +> [!TIP] +> 中文社区发布了关于 smolagents 的介绍和实践讲解视频(来源:[Issue#80](https://github.com/huggingface/smolagents/issues/80)),你可以访问[这里](https://www.youtube.com/watch?v=wwN3oAugc4c)进行观看! + +### 构建您的 agent + +要初始化一个最小化的 agent,您至少需要以下两个参数: + +- `model`,一个为您的 agent 提供动力的文本生成模型 - 因为 agent 与简单的 LLM 不同,它是一个使用 LLM 作为引擎的系统。您可以使用以下任一选项: + - [`TransformersModel`] 使用预初始化的 `transformers` 管道在本地机器上运行推理 + - [`HfApiModel`] 在底层使用 `huggingface_hub.InferenceClient` + - [`LiteLLMModel`] 让您通过 [LiteLLM](https://docs.litellm.ai/) 调用 100+ 不同的模型! + +- `tools`,agent 可以用来解决任务的 `Tools` 列表。它可以是一个空列表。您还可以通过定义可选参数 `add_base_tools=True` 在您的 `tools` 列表之上添加默认工具箱。 + +一旦有了这两个参数 `tools` 和 `model`,您就可以创建一个 agent 并运行它。您可以使用任何您喜欢的 LLM,无论是通过 [Hugging Face API](https://huggingface.co/docs/api-inference/en/index)、[transformers](https://github.com/huggingface/transformers/)、[ollama](https://ollama.com/),还是 [LiteLLM](https://www.litellm.ai/)。 + +<hfoptions id="选择一个LLM"> +<hfoption id="Hugging Face API"> + +Hugging Face API 可以免费使用而无需 token,但会有速率限制。 + +要访问受限模型或使用 PRO 账户提高速率限制,您需要设置环境变量 `HF_TOKEN` 或在初始化 `HfApiModel` 时传递 `token` 变量。 + +```python +from smolagents import CodeAgent, HfApiModel + +model_id = "meta-llama/Llama-3.3-70B-Instruct" + +model = HfApiModel(model_id=model_id, token="<YOUR_HUGGINGFACEHUB_API_TOKEN>") +agent = CodeAgent(tools=[], model=model, add_base_tools=True) + +agent.run( + "Could you give me the 118th number in the Fibonacci sequence?", +) +``` +</hfoption> +<hfoption id="本地Transformers模型"> + +```python +from smolagents import CodeAgent, TransformersModel + +model_id = "meta-llama/Llama-3.2-3B-Instruct" + +model = TransformersModel(model_id=model_id) +agent = CodeAgent(tools=[], model=model, add_base_tools=True) + +agent.run( + "Could you give me the 118th number in the Fibonacci sequence?", +) +``` +</hfoption> +<hfoption id="OpenAI或Anthropic API"> + +要使用 `LiteLLMModel`,您需要设置环境变量 `ANTHROPIC_API_KEY` 或 `OPENAI_API_KEY`,或者在初始化时传递 `api_key` 变量。 + +```python +from smolagents import CodeAgent, LiteLLMModel + +model = LiteLLMModel(model_id="anthropic/claude-3-5-sonnet-latest", api_key="YOUR_ANTHROPIC_API_KEY") # 也可以使用 'gpt-4o' +agent = CodeAgent(tools=[], model=model, add_base_tools=True) + +agent.run( + "Could you give me the 118th number in the Fibonacci sequence?", +) +``` +</hfoption> +<hfoption id="Ollama"> + +```python +from smolagents import CodeAgent, LiteLLMModel + +model = LiteLLMModel( + model_id="ollama_chat/llama3.2", # 这个模型对于 agent 行为来说有点弱 + api_base="http://localhost:11434", # 如果需要可以替换为远程 open-ai 兼容服务器 + api_key="YOUR_API_KEY" # 如果需要可以替换为 API key +) + +agent = CodeAgent(tools=[], model=model, add_base_tools=True) + +agent.run( + "Could you give me the 118th number in the Fibonacci sequence?", +) +``` +</hfoption> +</hfoptions> + +#### CodeAgent 和 ToolCallingAgent + +[`CodeAgent`] 是我们的默认 agent。它将在每一步编写并执行 Python 代码片段。 + +默认情况下,执行是在您的本地环境中完成的。 +这应该是安全的,因为唯一可以调用的函数是您提供的工具(特别是如果只有 Hugging Face 的工具)和一组预定义的安全函数,如 `print` 或 `math` 模块中的函数,所以您已经限制了可以执行的内容。 + +Python 解释器默认也不允许在安全列表之外导入,所以所有最明显的攻击都不应该成为问题。 +您可以通过在初始化 [`CodeAgent`] 时将授权模块作为字符串列表传递给参数 `additional_authorized_imports` 来授权额外的导入: + +```py +from smolagents import CodeAgent + +agent = CodeAgent(tools=[], model=model, additional_authorized_imports=['requests', 'bs4']) +agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?") +``` + +> [!WARNING] +> LLM 可以生成任意代码然后执行:不要添加任何不安全的导入! + +如果生成的代码尝试执行非法操作或出现常规 Python 错误,执行将停止。 + +您也可以使用 [E2B 代码执行器](https://e2b.dev/docs#what-is-e2-b) 而不是本地 Python 解释器,首先 [设置 `E2B_API_KEY` 环境变量](https://e2b.dev/dashboard?tab=keys),然后在初始化 agent 时传递 `use_e2b_executor=True`。 + +> [!TIP] +> 在 [该教程中](tutorials/secure_code_execution) 了解更多关于代码执行的内容。 + +我们还支持广泛使用的将动作编写为 JSON-like 块的方式:[`ToolCallingAgent`],它的工作方式与 [`CodeAgent`] 非常相似,当然没有 `additional_authorized_imports`,因为它不执行代码: + +```py +from smolagents import ToolCallingAgent + +agent = ToolCallingAgent(tools=[], model=model) +agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?") +``` + +### 检查 agent 运行 + +以下是一些有用的属性,用于检查运行后发生了什么: +- `agent.logs` 存储 agent 的细粒度日志。在 agent 运行的每一步,所有内容都会存储在一个字典中,然后附加到 `agent.logs` 中。 +- 运行 `agent.write_inner_memory_from_logs()` 会为 LLM 创建一个 agent 日志的内部内存,作为聊天消息列表。此方法会遍历日志的每一步,并仅存储它感兴趣的内容作为消息:例如,它会将系统提示和任务存储为单独的消息,然后对于每一步,它会将 LLM 输出存储为一条消息,工具调用输出存储为另一条消息。如果您想要更高级别的视图 - 但不是每个日志都会被此方法转录。 + +## 工具 + +工具是 agent 使用的原子函数。为了被 LLM 使用,它还需要一些构成其 API 的属性,这些属性将用于向 LLM 描述如何调用此工具: +- 名称 +- 描述 +- 输入类型和描述 +- 输出类型 + +例如,您可以查看 [`PythonInterpreterTool`]:它有一个名称、描述、输入描述、输出类型和一个执行操作的 `forward` 方法。 + +当 agent 初始化时,工具属性用于生成工具描述,该描述被嵌入到 agent 的系统提示中。这让 agent 知道它可以使用哪些工具以及为什么。 + +### 默认工具箱 + +Transformers 附带了一个用于增强 agent 的默认工具箱,您可以在初始化时通过参数 `add_base_tools = True` 将其添加到您的 agent 中: + +- **DuckDuckGo 网页搜索**:使用 DuckDuckGo 浏览器执行网页搜索。 +- **Python 代码解释器**:在安全环境中运行 LLM 生成的 Python 代码。只有在使用 `add_base_tools=True` 初始化 [`ToolCallingAgent`] 时才会添加此工具,因为基于代码的 agent 已经可以原生执行 Python 代码 +- **转录器**:基于 Whisper-Turbo 构建的语音转文本管道,将音频转录为文本。 + +您可以通过调用 [`load_tool`] 函数和要执行的任务手动使用工具。 + +```python +from smolagents import load_tool + +search_tool = load_tool("web_search") +print(search_tool("Who's the current president of Russia?")) +``` + +### 创建一个新工具 + +您可以创建自己的工具,用于 Hugging Face 默认工具未涵盖的用例。 +例如,让我们创建一个工具,返回 Hub 上给定任务下载量最多的模型。 + +您将从以下代码开始。 + +```python +from huggingface_hub import list_models + +task = "text-classification" + +most_downloaded_model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) +print(most_downloaded_model.id) +``` + +这段代码可以通过将其包装在一个函数中并添加 `tool` 装饰器快速转换为工具: +这不是构建工具的唯一方法:您可以直接将其定义为 [`Tool`] 的子类,这为您提供了更多的灵活性,例如初始化重型类属性的可能性。 + +让我们看看这两种选项的工作原理: + +<hfoptions id="构建工具"> +<hfoption id="使用@tool装饰一个函数"> + +```py +from smolagents import tool + +@tool +def model_download_tool(task: str) -> str: + """ + This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. + It returns the name of the checkpoint. + + Args: + task: The task for which to get the download count. + """ + most_downloaded_model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) + return most_downloaded_model.id +``` + +该函数需要: +- 一个清晰的名称。名称应该足够描述此工具的功能,以帮助为 agent 提供动力的 LLM。由于此工具返回任务下载量最多的模型,我们将其命名为 `model_download_tool`。 +- 输入和输出的类型提示 +- 一个描述,其中包括一个 'Args:' 部分,其中每个参数都被描述(这次没有类型指示,它将从类型提示中提取)。与工具名称一样,此描述是为您的 agent 提供动力的 LLM 的说明书,所以不要忽视它。 +所有这些元素将在初始化时自动嵌入到 agent 的系统提示中:因此要努力使它们尽可能清晰! + +> [!TIP] +> 此定义格式与 `apply_chat_template` 中使用的工具模式相同,唯一的区别是添加了 `tool` 装饰器:[这里](https://huggingface.co/blog/unified-tool-use#passing-tools-to-a-chat-template) 了解更多关于我们的工具使用 API。 +</hfoption> +<hfoption id="子类化Tool"> + +```py +from smolagents import Tool + +class ModelDownloadTool(Tool): + name = "model_download_tool" + description = "This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. It returns the name of the checkpoint." + inputs = {"task": {"type": "string", "description": "The task for which to get the download count."}} + output_type = "string" + + def forward(self, task: str) -> str: + most_downloaded_model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) + return most_downloaded_model.id +``` + +子类需要以下属性: +- 一个清晰的 `name`。名称应该足够描述此工具的功能,以帮助为 agent 提供动力的 LLM。由于此工具返回任务下载量最多的模型,我们将其命名为 `model_download_tool`。 +- 一个 `description`。与 `name` 一样,此描述是为您的 agent 提供动力的 LLM 的说明书,所以不要忽视它。 +- 输入类型和描述 +- 输出类型 +所有这些属性将在初始化时自动嵌入到 agent 的系统提示中:因此要努力使它们尽可能清晰! +</hfoption> +</hfoptions> + + +然后您可以直接初始化您的 agent: +```py +from smolagents import CodeAgent, HfApiModel +agent = CodeAgent(tools=[model_download_tool], model=HfApiModel()) +agent.run( + "Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?" +) +``` + +您将获得以下日志: +```text +╭──────────────────────────────────────── New run ─────────────────────────────────────────╮ +│ │ +│ Can you give me the name of the model that has the most downloads in the 'text-to-video' │ +│ task on the Hugging Face Hub? │ +│ │ +╰─ HfApiModel - Qwen/Qwen2.5-Coder-32B-Instruct ───────────────────────────────────────────╯ +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 0 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +╭─ Executing this code: ───────────────────────────────────────────────────────────────────╮ +│ 1 model_name = model_download_tool(task="text-to-video") │ +│ 2 print(model_name) │ +╰──────────────────────────────────────────────────────────────────────────────────────────╯ +Execution logs: +ByteDance/AnimateDiff-Lightning + +Out: None +[Step 0: Duration 0.27 seconds| Input tokens: 2,069 | Output tokens: 60] +━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ +╭─ Executing this code: ───────────────────────────────────────────────────────────────────╮ +│ 1 final_answer("ByteDance/AnimateDiff-Lightning") │ +╰──────────────────────────────────────────────────────────────────────────────────────────╯ +Out - Final answer: ByteDance/AnimateDiff-Lightning +[Step 1: Duration 0.10 seconds| Input tokens: 4,288 | Output tokens: 148] +Out[20]: 'ByteDance/AnimateDiff-Lightning' +``` + +> [!TIP] +> 在 [专用教程](./tutorials/tools#what-is-a-tool-and-how-to-build-one) 中了解更多关于工具的内容。 + +## 多 agent + +多 agent 系统是随着微软的框架 [Autogen](https://huggingface.co/papers/2308.08155) 引入的。 + +在这种类型的框架中,您有多个 agent 一起工作来解决您的任务,而不是只有一个。 +经验表明,这在大多数基准测试中表现更好。这种更好表现的原因在概念上很简单:对于许多任务,与其使用一个全能系统,您更愿意将单元专门用于子任务。在这里,拥有具有单独工具集和内存的 agent 可以实现高效的专业化。例如,为什么要用网页搜索 agent 访问的所有网页内容填充代码生成 agent 的内存?最好将它们分开。 + +您可以使用 `smolagents` 轻松构建分层多 agent 系统。 + +为此,将 agent 封装在 [`ManagedAgent`] 对象中。此对象需要参数 `agent`、`name` 和 `description`,这些参数将嵌入到管理 agent 的系统提示中,以让它知道如何调用此托管 agent,就像我们对工具所做的那样。 + +以下是一个使用我们的 [`DuckDuckGoSearchTool`] 制作一个管理特定网页搜索 agent 的 agent 的示例: + +```py +from smolagents import CodeAgent, HfApiModel, DuckDuckGoSearchTool, ManagedAgent + +model = HfApiModel() + +web_agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model) + +managed_web_agent = ManagedAgent( + agent=web_agent, + name="web_search", + description="Runs web searches for you. Give it your query as an argument." +) + +manager_agent = CodeAgent( + tools=[], model=model, managed_agents=[managed_web_agent] +) + +manager_agent.run("Who is the CEO of Hugging Face?") +``` + +> [!TIP] +> 有关高效多 agent 实现的深入示例,请参阅 [我们如何将多 agent 系统推向 GAIA 排行榜的顶部](https://huggingface.co/blog/beating-gaia)。 + + +## 与您的 agent 交谈并在酷炫的 Gradio 界面中可视化其思考过程 + +您可以使用 `GradioUI` 交互式地向您的 agent 提交任务并观察其思考和执行过程,以下是一个示例: + +```py +from smolagents import ( + load_tool, + CodeAgent, + HfApiModel, + GradioUI +) + +# 从 Hub 导入工具 +image_generation_tool = load_tool("m-ric/text-to-image") + +model = HfApiModel(model_id) + +# 使用图像生成工具初始化 agent +agent = CodeAgent(tools=[image_generation_tool], model=model) + +GradioUI(agent).launch() +``` + +在底层,当用户输入新答案时,agent 会以 `agent.run(user_request, reset=False)` 启动。 +`reset=False` 标志意味着在启动此新任务之前不会刷新 agent 的内存,这使得对话可以继续。 + +您也可以在其他 agent 化应用程序中使用此 `reset=False` 参数来保持对话继续。 + +## 下一步 + +要更深入地使用,您将需要查看我们的教程: +- [我们的代码 agent 如何工作的解释](./tutorials/secure_code_execution) +- [本指南关于如何构建好的 agent](./tutorials/building_good_agents)。 +- [工具使用的深入指南](./tutorials/tools)。 diff --git a/docs/source/zh/index.md b/docs/source/zh/index.md new file mode 100644 index 000000000..d79e8090c --- /dev/null +++ b/docs/source/zh/index.md @@ -0,0 +1,52 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. +--> + +# `smolagents` + +这是构建强大 agent 的最简单框架!顺便问一下,什么是 "agent"?我们在[此页面](conceptual_guides/intro_agents)提供了我们的定义,您还可以找到关于何时使用或不使用它们的建议(剧透:通常不使用 agent 会更好)。 + +> [!TIP] +> 译者注:Agent 的业内术语是“智能体”。本译文将保留 agent,不作翻译,以带来更高效的阅读体验。(在中文为主的文章中,It's easier to 注意到英文。Attention Is All You Need!) + +本库提供: + +✨ **简洁性**:Agent 逻辑仅需约千行代码。我们将抽象保持在原始代码之上的最小形态! + +🌐 **支持任何 LLM**:支持通过 Hub 托管的模型,使用其 `transformers` 版本或通过我们的推理 API 加载,也支持 OpenAI、Anthropic 等模型。使用任何 LLM 为 agent 提供动力都非常容易。 + +🧑💻 **一流的代码 agent 支持**,即编写代码作为其操作的 agent(与"用于编写代码的 agent"相对),[在此了解更多](tutorials/secure_code_execution)。 + +🤗 **Hub 集成**:您可以在 Hub 上共享和加载工具,更多功能即将推出! + +<div class="mt-10"> + <div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5"> + <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./guided_tour" + ><div class="w-full text-center bg-gradient-to-br from-blue-400 to-blue-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">导览</div> + <p class="text-gray-700">学习基础知识并熟悉使用 agent。如果您是第一次使用 agent,请从这里开始!</p> + </a> + <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./examples/text_to_sql" + ><div class="w-full text-center bg-gradient-to-br from-indigo-400 to-indigo-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">操作指南</div> + <p class="text-gray-700">实用指南,帮助您实现特定目标:创建一个生成和测试 SQL 查询的 agent!</p> + </a> + <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./conceptual_guides/intro_agents" + ><div class="w-full text-center bg-gradient-to-br from-pink-400 to-pink-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">概念指南</div> + <p class="text-gray-700">高级解释,帮助您更好地理解重要主题。</p> + </a> + <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="./tutorials/building_good_agents" + ><div class="w-full text-center bg-gradient-to-br from-purple-400 to-purple-500 rounded-lg py-1.5 font-semibold mb-5 text-white text-lg leading-relaxed">教程</div> + <p class="text-gray-700">涵盖构建 agent 重要方面的横向教程。</p> + </a> + </div> +</div> diff --git a/docs/source/zh/reference/agents.md b/docs/source/zh/reference/agents.md new file mode 100644 index 000000000..9cdca7d0b --- /dev/null +++ b/docs/source/zh/reference/agents.md @@ -0,0 +1,143 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# Agents + +<Tip warning={true}> + +Smolagents is an experimental API which is subject to change at any time. Results returned by the agents +can vary as the APIs or underlying models are prone to change. + +</Tip> + +To learn more about agents and tools make sure to read the [introductory guide](../index). This page +contains the API docs for the underlying classes. + +## Agents + +Our agents inherit from [`MultiStepAgent`], which means they can act in multiple steps, each step consisting of one thought, then one tool call and execution. Read more in [this conceptual guide](../conceptual_guides/react). + +We provide two types of agents, based on the main [`Agent`] class. + - [`CodeAgent`] is the default agent, it writes its tool calls in Python code. + - [`ToolCallingAgent`] writes its tool calls in JSON. + +Both require arguments `model` and list of tools `tools` at initialization. + + +### Classes of agents + +[[autodoc]] MultiStepAgent + +[[autodoc]] CodeAgent + +[[autodoc]] ToolCallingAgent + + +### ManagedAgent + +[[autodoc]] ManagedAgent + +### stream_to_gradio + +[[autodoc]] stream_to_gradio + +### GradioUI + +[[autodoc]] GradioUI + +## Models + +You're free to create and use your own models to power your agent. + +You could use any `model` callable for your agent, as long as: +1. It follows the [messages format](./chat_templating) (`List[Dict[str, str]]`) for its input `messages`, and it returns a `str`. +2. It stops generating outputs *before* the sequences passed in the argument `stop_sequences` + +For defining your LLM, you can make a `custom_model` method which accepts a list of [messages](./chat_templating) and returns text. This callable also needs to accept a `stop_sequences` argument that indicates when to stop generating. + +```python +from huggingface_hub import login, InferenceClient + +login("<YOUR_HUGGINGFACEHUB_API_TOKEN>") + +model_id = "meta-llama/Llama-3.3-70B-Instruct" + +client = InferenceClient(model=model_id) + +def custom_model(messages, stop_sequences=["Task"]) -> str: + response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000) + answer = response.choices[0].message.content + return answer +``` + +Additionally, `custom_model` can also take a `grammar` argument. In the case where you specify a `grammar` upon agent initialization, this argument will be passed to the calls to model, with the `grammar` that you defined upon initialization, to allow [constrained generation](https://huggingface.co/docs/text-generation-inference/conceptual/guidance) in order to force properly-formatted agent outputs. + +### TransformersModel + +For convenience, we have added a `TransformersModel` that implements the points above by building a local `transformers` pipeline for the model_id given at initialization. + +```python +from smolagents import TransformersModel + +model = TransformersModel(model_id="HuggingFaceTB/SmolLM-135M-Instruct") + +print(model([{"role": "user", "content": "Ok!"}], stop_sequences=["great"])) +``` +```text +>>> What a +``` + +[[autodoc]] TransformersModel + +### HfApiModel + +The `HfApiModel` wraps an [HF Inference API](https://huggingface.co/docs/api-inference/index) client for the execution of the LLM. + +```python +from smolagents import HfApiModel + +messages = [ + {"role": "user", "content": "Hello, how are you?"}, + {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, + {"role": "user", "content": "No need to help, take it easy."}, +] + +model = HfApiModel() +print(model(messages)) +``` +```text +>>> Of course! If you change your mind, feel free to reach out. Take care! +``` +[[autodoc]] HfApiModel + +### LiteLLMModel + +The `LiteLLMModel` leverages [LiteLLM](https://www.litellm.ai/) to support 100+ LLMs from various providers. +You can pass kwargs upon model initialization that will then be used whenever using the model, for instance below we pass `temperature`. + +```python +from smolagents import LiteLLMModel + +messages = [ + {"role": "user", "content": "Hello, how are you?"}, + {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, + {"role": "user", "content": "No need to help, take it easy."}, +] + +model = LiteLLMModel("anthropic/claude-3-5-sonnet-latest", temperature=0.2) +print(model(messages, max_tokens=10)) +``` + +[[autodoc]] LiteLLMModel \ No newline at end of file diff --git a/docs/source/zh/reference/tools.md b/docs/source/zh/reference/tools.md new file mode 100644 index 000000000..022ad35d2 --- /dev/null +++ b/docs/source/zh/reference/tools.md @@ -0,0 +1,91 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# Tools + +<Tip warning={true}> + +Smolagents is an experimental API which is subject to change at any time. Results returned by the agents +can vary as the APIs or underlying models are prone to change. + +</Tip> + +To learn more about agents and tools make sure to read the [introductory guide](../index). This page +contains the API docs for the underlying classes. + +## Tools + +### load_tool + +[[autodoc]] load_tool + +### tool + +[[autodoc]] tool + +### Tool + +[[autodoc]] Tool + +### launch_gradio_demo + +[[autodoc]] launch_gradio_demo + +## Default tools + +### PythonInterpreterTool + +[[autodoc]] PythonInterpreterTool + +### DuckDuckGoSearchTool + +[[autodoc]] DuckDuckGoSearchTool + +### VisitWebpageTool + +[[autodoc]] VisitWebpageTool + +## ToolCollection + +[[autodoc]] ToolCollection + +## Agent Types + +Agents can handle any type of object in-between tools; tools, being completely multimodal, can accept and return +text, image, audio, video, among other types. In order to increase compatibility between tools, as well as to +correctly render these returns in ipython (jupyter, colab, ipython notebooks, ...), we implement wrapper classes +around these types. + +The wrapped objects should continue behaving as initially; a text object should still behave as a string, an image +object should still behave as a `PIL.Image`. + +These types have three specific purposes: + +- Calling `to_raw` on the type should return the underlying object +- Calling `to_string` on the type should return the object as a string: that can be the string in case of an `AgentText` + but will be the path of the serialized version of the object in other instances +- Displaying it in an ipython kernel should display the object correctly + +### AgentText + +[[autodoc]] smolagents.types.AgentText + +### AgentImage + +[[autodoc]] smolagents.types.AgentImage + +### AgentAudio + +[[autodoc]] smolagents.types.AgentAudio diff --git a/docs/source/zh/tutorials/building_good_agents.md b/docs/source/zh/tutorials/building_good_agents.md new file mode 100644 index 000000000..47cd202a0 --- /dev/null +++ b/docs/source/zh/tutorials/building_good_agents.md @@ -0,0 +1,284 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# 构建好用的 agent + +[[open-in-colab]] + +能良好工作的 agent 和不能工作的 agent 之间,有天壤之别。 +我们怎么样才能构建出属于前者的 agent 呢? +在本指南中,我们将看到构建 agent 的最佳实践。 + +> [!TIP] +> 如果你是 agent 构建的新手,请确保首先阅读 [agent 介绍](../conceptual_guides/intro_agents) 和 [smolagents 导览](../guided_tour)。 + +### 最好的 agent 系统是最简单的:尽可能简化工作流 + +在你的工作流中赋予 LLM 一些自主权,会引入一些错误风险。 + +经过良好编程的 agent 系统,通常具有良好的错误日志记录和重试机制,因此 LLM 引擎有机会自我纠错。但为了最大限度地降低 LLM 错误的风险,你应该简化你的工作流! + +让我们回顾一下 [agent 介绍](../conceptual_guides/intro_agents) 中的例子:一个为冲浪旅行公司回答用户咨询的机器人。 +与其让 agent 每次被问及新的冲浪地点时,都分别调用 "旅行距离 API" 和 "天气 API",你可以只创建一个统一的工具 "return_spot_information",一个同时调用这两个 API,并返回它们连接输出的函数。 + +这可以降低成本、延迟和错误风险! + +主要的指导原则是:尽可能减少 LLM 调用的次数。 + +这可以带来一些启发: +- 尽可能把两个工具合并为一个,就像我们两个 API 的例子。 +- 尽可能基于确定性函数,而不是 agent 决策,来实现逻辑。 + +### 改善流向 LLM 引擎的信息流 + +记住,你的 LLM 引擎就像一个 ~智能~ 机器人,被关在一个房间里,与外界唯一的交流方式是通过门缝传递的纸条。 + +如果你没有明确地将信息放入其提示中,它将不知道发生的任何事情。 + +所以首先要让你的任务非常清晰! +由于 agent 由 LLM 驱动,任务表述的微小变化可能会产生完全不同的结果。 + +然后,改善工具使用中流向 agent 的信息流。 + +需要遵循的具体指南: +- 每个工具都应该记录(只需在工具的 `forward` 方法中使用 `print` 语句)对 LLM 引擎可能有用的所有信息。 + - 特别是,记录工具执行错误的详细信息会很有帮助! + +例如,这里有一个根据位置和日期时间检索天气数据的工具: + +首先,这是一个糟糕的版本: +```python +import datetime +from smolagents import tool + +def get_weather_report_at_coordinates(coordinates, date_time): + # 虚拟函数,返回 [温度(°C),降雨风险(0-1),浪高(m)] + return [28.0, 0.35, 0.85] + +def get_coordinates_from_location(location): + # 返回虚拟坐标 + return [3.3, -42.0] + +@tool +def get_weather_api(location: str, date_time: str) -> str: + """ + Returns the weather report. + + Args: + location: the name of the place that you want the weather for. + date_time: the date and time for which you want the report. + """ + lon, lat = convert_location_to_coordinates(location) + date_time = datetime.strptime(date_time) + return str(get_weather_report_at_coordinates((lon, lat), date_time)) +``` + +为什么它不好? +- 没有说明 `date_time` 应该使用的格式 +- 没有说明位置应该如何指定 +- 没有记录机制来处理明确的报错情况,如位置格式不正确或 date_time 格式不正确 +- 输出格式难以理解 + +如果工具调用失败,内存中记录的错误跟踪,可以帮助 LLM 逆向工程工具来修复错误。但为什么要让它做这么多繁重的工作呢? + +构建这个工具的更好方式如下: +```python +@tool +def get_weather_api(location: str, date_time: str) -> str: + """ + Returns the weather report. + + Args: + location: the name of the place that you want the weather for. Should be a place name, followed by possibly a city name, then a country, like "Anchor Point, Taghazout, Morocco". + date_time: the date and time for which you want the report, formatted as '%m/%d/%y %H:%M:%S'. + """ + lon, lat = convert_location_to_coordinates(location) + try: + date_time = datetime.strptime(date_time) + except Exception as e: + raise ValueError("Conversion of `date_time` to datetime format failed, make sure to provide a string in format '%m/%d/%y %H:%M:%S'. Full trace:" + str(e)) + temperature_celsius, risk_of_rain, wave_height = get_weather_report_at_coordinates((lon, lat), date_time) + return f"Weather report for {location}, {date_time}: Temperature will be {temperature_celsius}°C, risk of rain is {risk_of_rain*100:.0f}%, wave height is {wave_height}m." +``` + +一般来说,为了减轻 LLM 的负担,要问自己的好问题是:"如果我是一个第一次使用这个工具的傻瓜,使用这个工具编程并纠正自己的错误有多容易?"。 + +### 给 agent 更多参数 + +除了简单的任务描述字符串外,你还可以使用 `additional_args` 参数传递任何类型的对象: + +```py +from smolagents import CodeAgent, HfApiModel + +model_id = "meta-llama/Llama-3.3-70B-Instruct" + +agent = CodeAgent(tools=[], model=HfApiModel(model_id=model_id), add_base_tools=True) + +agent.run( + "Why does Mike not know many people in New York?", + additional_args={"mp3_sound_file_url":'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/recording.mp3'} +) +``` +例如,你可以使用这个 `additional_args` 参数传递你希望 agent 利用的图像或字符串。 + + +## 如何调试你的 agent + +### 1. 使用更强大的 LLM + +在 agent 工作流中,有些错误是实际错误,有些则是你的 LLM 引擎没有正确推理的结果。 +例如,参考这个我要求创建一个汽车图片的 `CodeAgent` 的运行记录: +```text +==================================================================================================== New task ==================================================================================================== +Make me a cool car picture +──────────────────────────────────────────────────────────────────────────────────────────────────── New step ───────────────────────────────────────────────────────────────────────────────────────────────────── +Agent is executing the code below: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +image_generator(prompt="A cool, futuristic sports car with LED headlights, aerodynamic design, and vibrant color, high-res, photorealistic") +────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── + +Last output from code snippet: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png +Step 1: + +- Time taken: 16.35 seconds +- Input tokens: 1,383 +- Output tokens: 77 +──────────────────────────────────────────────────────────────────────────────────────────────────── New step ───────────────────────────────────────────────────────────────────────────────────────────────────── +Agent is executing the code below: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +final_answer("/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png") +────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +Print outputs: + +Last output from code snippet: ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── +/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png +Final answer: +/var/folders/6m/9b1tts6d5w960j80wbw9tx3m0000gn/T/tmpx09qfsdd/652f0007-3ee9-44e2-94ac-90dae6bb89a4.png +``` +用户看到的是返回了一个路径,而不是图像。 +这看起来像是系统的错误,但实际上 agent 系统并没有导致错误:只是 LLM 大脑犯了一个错误,没有把图像输出,保存到变量中。 +因此,它无法再次访问图像,只能利用保存图像时记录的路径,所以它返回的是路径,而不是图像。 + +调试 agent 的第一步是"使用更强大的 LLM"。像 `Qwen2.5-72B-Instruct` 这样的替代方案不会犯这种错误。 + +### 2. 提供更多指导/更多信息 + +你也可以使用不太强大的模型,只要你更有效地指导它们。 + +站在模型的角度思考:如果你是模型在解决任务,你会因为系统提示+任务表述+工具描述中提供的信息而挣扎吗? + +你需要一些额外的说明吗? + +为了提供额外信息,我们不建议立即更改系统提示:默认系统提示有许多调整,除非你非常了解提示,否则你很容易翻车。 +更好的指导 LLM 引擎的方法是: +- 如果是关于要解决的任务:把所有细节添加到任务中。任务可以有几百页长。 +- 如果是关于如何使用工具:你的工具的 description 属性。 + + +### 3. 更改系统提示(通常不建议) + +如果上述说明不够,你可以更改系统提示。 + +让我们看看它是如何工作的。例如,让我们检查 [`CodeAgent`] 的默认系统提示(下面的版本通过跳过零样本示例进行了缩短)。 + +```python +print(agent.system_prompt_template) +``` +你会得到: +```text +You are an expert assistant who can solve any task using code blobs. You will be given a task to solve as best you can. +To do so, you have been given access to a list of tools: these tools are basically Python functions which you can call with code. +To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences. + +At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task and the tools that you want to use. +Then in the 'Code:' sequence, you should write the code in simple Python. The code sequence must end with '<end_code>' sequence. +During each intermediate step, you can use 'print()' to save whatever important information you will then need. +These print outputs will then appear in the 'Observation:' field, which will be available as input for the next step. +In the end you have to return a final answer using the `final_answer` tool. + +Here are a few examples using notional tools: +--- +{examples} + +Above example were using notional tools that might not exist for you. On top of performing computations in the Python code snippets that you create, you only have access to these tools: + +{{tool_descriptions}} + +{{managed_agents_descriptions}} + +Here are the rules you should always follow to solve your task: +1. Always provide a 'Thought:' sequence, and a 'Code:\n```py' sequence ending with '```<end_code>' sequence, else you will fail. +2. Use only variables that you have defined! +3. Always use the right arguments for the tools. DO NOT pass the arguments as a dict as in 'answer = wiki({'query': "What is the place where James Bond lives?"})', but use the arguments directly as in 'answer = wiki(query="What is the place where James Bond lives?")'. +4. Take care to not chain too many sequential tool calls in the same code block, especially when the output format is unpredictable. For instance, a call to search has an unpredictable return format, so do not have another tool call that depends on its output in the same block: rather output results with print() to use them in the next block. +5. Call a tool only when needed, and never re-do a tool call that you previously did with the exact same parameters. +6. Don't name any new variable with the same name as a tool: for instance don't name a variable 'final_answer'. +7. Never create any notional variables in our code, as having these in your logs might derail you from the true variables. +8. You can use imports in your code, but only from the following list of modules: {{authorized_imports}} +9. The state persists between code executions: so if in one step you've created variables or imported modules, these will all persist. +10. Don't give up! You're in charge of solving the task, not providing directions to solve it. + +Now Begin! If you solve the task correctly, you will receive a reward of $1,000,000. +``` + +如你所见,有一些占位符,如 `"{{tool_descriptions}}"`:这些将在 agent 初始化时用于插入某些自动生成的工具或管理 agent 的描述。 + +因此,虽然你可以通过将自定义提示作为参数传递给 `system_prompt` 参数来覆盖此系统提示模板,但你的新系统提示必须包含以下占位符: +- `"{{tool_descriptions}}"` 用于插入工具描述。 +- `"{{managed_agents_description}}"` 用于插入 managed agent 的描述(如果有)。 +- 仅限 `CodeAgent`:`"{{authorized_imports}}"` 用于插入授权导入列表。 + +然后你可以根据如下,更改系统提示: + +```py +from smolagents.prompts import CODE_SYSTEM_PROMPT + +modified_system_prompt = CODE_SYSTEM_PROMPT + "\nHere you go!" # 在此更改系统提示 + +agent = CodeAgent( + tools=[], + model=HfApiModel(), + system_prompt=modified_system_prompt +) +``` + +这也适用于 [`ToolCallingAgent`]。 + + +### 4. 额外规划 + +我们提供了一个用于补充规划步骤的模型,agent 可以在正常操作步骤之间定期运行。在此步骤中,没有工具调用,LLM 只是被要求更新它知道的事实列表,并根据这些事实反推它应该采取的下一步。 + +```py +from smolagents import load_tool, CodeAgent, HfApiModel, DuckDuckGoSearchTool +from dotenv import load_dotenv + +load_dotenv() + +# 从 Hub 导入工具 +image_generation_tool = load_tool("m-ric/text-to-image", trust_remote_code=True) + +search_tool = DuckDuckGoSearchTool() + +agent = CodeAgent( + tools=[search_tool], + model=HfApiModel("Qwen/Qwen2.5-72B-Instruct"), + planning_interval=3 # 这是你激活规划的地方! +) + +# 运行它! +result = agent.run( + "How long would a cheetah at full speed take to run the length of Pont Alexandre III?", +) +``` \ No newline at end of file diff --git a/docs/source/zh/tutorials/secure_code_execution.md b/docs/source/zh/tutorials/secure_code_execution.md new file mode 100644 index 000000000..6017aefb9 --- /dev/null +++ b/docs/source/zh/tutorials/secure_code_execution.md @@ -0,0 +1,82 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# 安全代码执行 + +[[open-in-colab]] + +> [!TIP] +> 如果你是第一次构建 agent,请先阅读 [agent 介绍](../conceptual_guides/intro_agents) 和 [smolagents 导览](../guided_tour)。 + +### 代码智能体 + +[多项](https://huggingface.co/papers/2402.01030) [研究](https://huggingface.co/papers/2411.01747) [表明](https://huggingface.co/papers/2401.00812),让大语言模型用代码编写其动作(工具调用)比当前标准的工具调用格式要好得多,目前行业标准是 "将动作写成包含工具名称和参数的 JSON" 的各种变体。 + +为什么代码更好?因为我们专门为计算机执行的动作而设计编程语言。如果 JSON 片段是更好的方式,那么这个工具包就应该是用 JSON 片段编写的,魔鬼就会嘲笑我们。 + +代码就是表达计算机动作的更好方式。它具有更好的: +- **组合性**:你能像定义 Python 函数那样,在 JSON 动作中嵌套其他 JSON 动作,或者定义一组 JSON 动作以便以后重用吗? +- **对象管理**:你如何在 JSON 中存储像 `generate_image` 这样的动作的输出? +- **通用性**:代码是为了简单地表达任何可以让计算机做的事情而构建的。 +- **在 LLM 训练语料库中的表示**:天赐良机,为什么不利用已经包含在 LLM 训练语料库中的大量高质量动作呢? + +下图展示了这一点,取自 [可执行代码动作引出更好的 LLM 智能体](https://huggingface.co/papers/2402.01030)。 + +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/code_vs_json_actions.png"> + +这就是为什么我们强调提出代码智能体,在本例中是 Python 智能体,这意味着我们要在构建安全的 Python 解释器上投入更多精力。 + +### 本地 Python 解释器 + +默认情况下,`CodeAgent` 会在你的环境中运行 LLM 生成的代码。 +这个执行不是由普通的 Python 解释器完成的:我们从零开始重新构建了一个更安全的 `LocalPythonInterpreter`。 +这个解释器通过以下方式设计以确保安全: + - 将导入限制为用户显式传递的列表 + - 限制操作次数以防止无限循环和资源膨胀 + - 不会执行任何未预定义的操作 + +我们已经在许多用例中使用了这个解释器,从未观察到对环境造成任何损害。 + +然而,这个解决方案并不是万无一失的:可以想象,如果 LLM 被微调用于恶意操作,仍然可能损害你的环境。例如,如果你允许像 `Pillow` 这样无害的包处理图像,LLM 可能会生成数千张图像保存以膨胀你的硬盘。 +如果你自己选择了 LLM 引擎,这当然不太可能,但它可能会发生。 + +所以如果你想格外谨慎,可以使用下面描述的远程代码执行选项。 + +### E2B 代码执行器 + +为了最大程度的安全性,你可以使用我们与 E2B 的集成在沙盒环境中运行代码。这是一个远程执行服务,可以在隔离的容器中运行你的代码,使代码无法影响你的本地环境。 + +为此,你需要设置你的 E2B 账户并在环境变量中设置 `E2B_API_KEY`。请前往 [E2B 快速入门文档](https://e2b.dev/docs/quickstart) 了解更多信息。 + +然后你可以通过 `pip install e2b-code-interpreter python-dotenv` 安装它。 + +现在你已经准备好了! + +要将代码执行器设置为 E2B,只需在初始化 `CodeAgent` 时传递标志 `use_e2b_executor=True`。 +请注意,你应该将所有工具的依赖项添加到 `additional_authorized_imports` 中,以便执行器安装它们。 + +```py +from smolagents import CodeAgent, VisitWebpageTool, HfApiModel +agent = CodeAgent( + tools = [VisitWebpageTool()], + model=HfApiModel(), + additional_authorized_imports=["requests", "markdownify"], + use_e2b_executor=True +) + +agent.run("What was Abraham Lincoln's preferred pet?") +``` + +目前 E2B 代码执行暂不兼容多 agent——因为把 agent 调用放在应该在远程执行的代码块里,是非常混乱的。但我们正在努力做到这件事! diff --git a/docs/source/zh/tutorials/tools.md b/docs/source/zh/tutorials/tools.md new file mode 100644 index 000000000..216d93b96 --- /dev/null +++ b/docs/source/zh/tutorials/tools.md @@ -0,0 +1,221 @@ +<!--Copyright 2024 The HuggingFace Team. All rights reserved. + +Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + +http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on +an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the +specific language governing permissions and limitations under the License. + +⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be +rendered properly in your Markdown viewer. + +--> +# 工具 + +[[open-in-colab]] + +在这里,我们将学习高级工具的使用。 + +> [!TIP] +> 如果你是构建 agent 的新手,请确保先阅读 [agent 介绍](../conceptual_guides/intro_agents) 和 [smolagents 导览](../guided_tour)。 + +- [工具](#工具) + - [什么是工具,如何构建一个工具?](#什么是工具如何构建一个工具) + - [将你的工具分享到 Hub](#将你的工具分享到-hub) + - [将 Space 导入为工具](#将-space-导入为工具) + - [使用 LangChain 工具](#使用-langchain-工具) + - [管理你的 agent 工具箱](#管理你的-agent-工具箱) + - [使用工具集合](#使用工具集合) + +### 什么是工具,如何构建一个工具? + +工具主要是 LLM 可以在 agent 系统中使用的函数。 + +但要使用它,LLM 需要被提供一个 API:名称、工具描述、输入类型和描述、输出类型。 + +所以它不能仅仅是一个函数。它应该是一个类。 + +因此,核心上,工具是一个类,它包装了一个函数,并带有帮助 LLM 理解如何使用它的元数据。 + +以下是它的结构: + +```python +from smolagents import Tool + +class HFModelDownloadsTool(Tool): + name = "model_download_counter" + description = """ + This is a tool that returns the most downloaded model of a given task on the Hugging Face Hub. + It returns the name of the checkpoint.""" + inputs = { + "task": { + "type": "string", + "description": "the task category (such as text-classification, depth-estimation, etc)", + } + } + output_type = "string" + + def forward(self, task: str): + from huggingface_hub import list_models + + model = next(iter(list_models(filter=task, sort="downloads", direction=-1))) + return model.id + +model_downloads_tool = HFModelDownloadsTool() +``` + +自定义工具继承 [`Tool`] 以继承有用的方法。子类还定义了: +- 一个属性 `name`,对应于工具本身的名称。名称通常描述工具的功能。由于代码返回任务中下载量最多的模型,我们将其命名为 `model_download_counter`。 +- 一个属性 `description`,用于填充 agent 的系统提示。 +- 一个 `inputs` 属性,它是一个带有键 `"type"` 和 `"description"` 的字典。它包含帮助 Python 解释器对输入做出明智选择的信息。 +- 一个 `output_type` 属性,指定输出类型。`inputs` 和 `output_type` 的类型应为 [Pydantic 格式](https://docs.pydantic.dev/latest/concepts/json_schema/#generating-json-schema),它们可以是以下之一:[`~AUTHORIZED_TYPES`]。 +- 一个 `forward` 方法,包含要执行的推理代码。 + +这就是它在 agent 中使用所需的全部内容! + +还有另一种构建工具的方法。在 [guided_tour](../guided_tour) 中,我们使用 `@tool` 装饰器实现了一个工具。[`tool`] 装饰器是定义简单工具的推荐方式,但有时你需要更多:在类中使用多个方法以获得更清晰的代码,或使用额外的类属性。 + +在这种情况下,你可以通过如上所述继承 [`Tool`] 来构建你的工具。 + +### 将你的工具分享到 Hub + +你可以通过调用 [`~Tool.push_to_hub`] 将你的自定义工具分享到 Hub。确保你已经在 Hub 上为其创建了一个仓库,并且使用的是具有读取权限的 token。 + +```python +model_downloads_tool.push_to_hub("{your_username}/hf-model-downloads", token="<YOUR_HUGGINGFACEHUB_API_TOKEN>") +``` + +为了使推送到 Hub 正常工作,你的工具需要遵守一些规则: +- 所有方法都是自包含的,例如使用来自其参数中的变量。 +- 根据上述要点,**所有导入应直接在工具的函数中定义**,否则在尝试使用 [`~Tool.save`] 或 [`~Tool.push_to_hub`] 调用你的自定义工具时会出现错误。 +- 如果你继承了 `__init__` 方法,除了 `self` 之外,你不能给它任何其他参数。这是因为在特定工具实例初始化期间设置的参数很难跟踪,这阻碍了将它们正确分享到 Hub。无论如何,创建特定类的想法是你已经可以为任何需要硬编码的内容设置类属性(只需在 `class YourTool(Tool):` 行下直接设置 `your_variable=(...)`)。当然,你仍然可以通过将内容分配给 `self.your_variable` 在代码中的任何地方创建类属性。 + +一旦你的工具被推送到 Hub,你就可以查看它。[这里](https://huggingface.co/spaces/m-ric/hf-model-downloads) 是我推送的 `model_downloads_tool`。它有一个漂亮的 gradio 界面。 + +在深入工具文件时,你可以发现所有工具的逻辑都在 [tool.py](https://huggingface.co/spaces/m-ric/hf-model-downloads/blob/main/tool.py) 下。这是你可以检查其他人分享的工具的地方。 + +然后你可以使用 [`load_tool`] 加载工具或使用 [`~Tool.from_hub`] 创建它,并将其传递给 agent 中的 `tools` 参数。 +由于运行工具意味着运行自定义代码,你需要确保你信任该仓库,因此我们需要传递 `trust_remote_code=True` 来从 Hub 加载工具。 + +```python +from smolagents import load_tool, CodeAgent + +model_download_tool = load_tool( + "{your_username}/hf-model-downloads", + trust_remote_code=True +) +``` + +### 将 Space 导入为工具 + +你可以使用 [`Tool.from_space`] 方法直接从 Hub 导入一个 Space 作为工具! + +你只需要提供 Hub 上 Space 的 id、它的名称和一个帮助你的 agent 理解工具功能的描述。在底层,这将使用 [`gradio-client`](https://pypi.org/project/gradio-client/) 库来调用 Space。 + +例如,让我们从 Hub 导入 [FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev) Space 并使用它生成一张图片。 + +```python +image_generation_tool = Tool.from_space( + "black-forest-labs/FLUX.1-schnell", + name="image_generator", + description="Generate an image from a prompt" +) + +image_generation_tool("A sunny beach") +``` +瞧,这是你的图片!🏖️ + +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/sunny_beach.webp"> + +然后你可以像使用任何其他工具一样使用这个工具。例如,让我们改进提示 `A rabbit wearing a space suit` 并生成它的图片。 + +```python +from smolagents import CodeAgent, HfApiModel + +model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") +agent = CodeAgent(tools=[image_generation_tool], model=model) + +agent.run( + "Improve this prompt, then generate an image of it.", prompt='A rabbit wearing a space suit' +) +``` + +```text +=== Agent thoughts: +improved_prompt could be "A bright blue space suit wearing rabbit, on the surface of the moon, under a bright orange sunset, with the Earth visible in the background" + +Now that I have improved the prompt, I can use the image generator tool to generate an image based on this prompt. +>>> Agent is executing the code below: +image = image_generator(prompt="A bright blue space suit wearing rabbit, on the surface of the moon, under a bright orange sunset, with the Earth visible in the background") +final_answer(image) +``` + +<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/rabbit_spacesuit_flux.webp"> + +这得有多酷?🤩 + +### 使用 LangChain 工具 + +我们喜欢 Langchain,并认为它有一套非常吸引人的工具。 +要从 LangChain 导入工具,请使用 `from_langchain()` 方法。 + +以下是如何使用它来重现介绍中的搜索结果,使用 LangChain 的 web 搜索工具。 +这个工具需要 `pip install langchain google-search-results -q` 才能正常工作。 +```python +from langchain.agents import load_tools + +search_tool = Tool.from_langchain(load_tools(["serpapi"])[0]) + +agent = CodeAgent(tools=[search_tool], model=model) + +agent.run("How many more blocks (also denoted as layers) are in BERT base encoder compared to the encoder from the architecture proposed in Attention is All You Need?") +``` + +### 管理你的 agent 工具箱 + +你可以通过添加或替换工具来管理 agent 的工具箱。 + +让我们将 `model_download_tool` 添加到一个仅使用默认工具箱初始化的现有 agent 中。 + +```python +from smolagents import HfApiModel + +model = HfApiModel("Qwen/Qwen2.5-Coder-32B-Instruct") + +agent = CodeAgent(tools=[], model=model, add_base_tools=True) +agent.tools.append(model_download_tool) +``` +现在我们可以利用新工具: + +```python +agent.run( + "Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub but reverse the letters?" +) +``` + + +> [!TIP] +> 注意不要向 agent 添加太多工具:这可能会让较弱的 LLM 引擎不堪重负。 + + +### 使用工具集合 + +你可以通过使用 ToolCollection 对象来利用工具集合,使用你想要使用的集合的 slug。 +然后将它们作为列表传递给 agent 初始化,并开始使用它们! + +```py +from smolagents import ToolCollection, CodeAgent + +image_tool_collection = ToolCollection( + collection_slug="huggingface-tools/diffusion-tools-6630bb19a942c2306a2cdb6f", + token="<YOUR_HUGGINGFACEHUB_API_TOKEN>" +) +agent = CodeAgent(tools=[*image_tool_collection.tools], model=model, add_base_tools=True) + +agent.run("Please draw me a picture of rivers and lakes.") +``` + +为了加快启动速度,工具仅在 agent 调用时加载。