NVIDIA · dnandakumar-nv · Jul 6, 2025 · Jul 6, 2025 · Jul 6, 2025 · Jul 6, 2025
@@ -69,6 +69,7 @@ For details on workflow configuration, including sections not utilized in the ab
 ReAct Agent <./react-agent.md>
 Reasoning Agent <./reasoning-agent.md>
 ReWOO Agent <./rewoo-agent.md>
+Responses API and Agent <./responses-api-and-agent.md>
 Router Agent <./router-agent.md>
 Sequential Executor <./sequential-executor.md>
 Tool Calling Agent <./tool-calling-agent.md>

@@ -0,0 +1,131 @@
+<!--
+SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+SPDX-License-Identifier: Apache-2.0
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Responses API and Agent
+
+The NeMo Agent toolkit supports OpenAI's Responses API through two complementary pieces:
+
+1) LLM client configuration via the `api_type` field, and 2) a dedicated workflow agent `_type: responses_api_agent` designed for tool use with the Responses API.
+
+Unlike standard chat-based integrations, the Responses API enables models to use built-in tools (for example, Code Interpreter) and connect to remote tools using the Model Context Protocol (MCP). This page explains how to configure an LLM for Responses and how to use the dedicated agent.
+
+
+## Features
+
+- **LLM Client Switch**: Select the LLM client mode using `api_type`.
+- **Built-in Tools**: Bind Responses built-ins such as Code Interpreter via `builtin_tools`.
+- **MCP Tools**: Connect remote tools using `mcp_tools` with fields like `server_label` and `server_url`.
+- **NAT Tools**: Continue to use toolkit tools through `nat_tools` (executed by the agent graph).
+- **Agentic Workflow**: The `_type: responses_api_agent` integrates tool binding with the NeMo Agent dual-node graph.
+
+
+## Requirements
+
+- A model that supports the Responses API and any enabled built-in tools.
+- For MCP usage, a reachable MCP server and any necessary credentials.
+
+
+## LLM Configuration: `api_type`
+
+LLM clients support an `api_type` selector. By default, `api_type` is `chat_completions`. To use the Responses API, set `api_type` to `responses` in your LLM configuration.
+
+### Example
+
+```yaml
+llms:
+  openai_llm:
+    _type: openai
+    model_name: gpt-5-mini-2025-08-07
+    # Default is `chat_completions`; set to `responses` to enable the Responses API
+    api_type: responses
+```
+
+Notes:
+- If `api_type` is omitted, the client uses `chat_completions`.
+- The Responses API unlocks built-in tools and MCP integration.
+
+## Agent Configuration: `_type: responses_api_agent`
+
+The Responses API agent binds tools directly to the LLM for execution under the Responses API, while NAT tools run via the agent graph. This preserves the familiar flow of the NeMo Agent toolkit with added tool capabilities.
+
+### Example `config.yml`
+
+```yaml
+functions:
+  current_datetime:
+    _type: current_datetime
+
+llms:
+  openai_llm:
+    _type: openai
+    model_name: gpt-5-mini-2025-08-07
+    api_type: responses
+
+workflow:
+  _type: responses_api_agent
+  llm_name: openai_llm
+  verbose: true
+  handle_tool_errors: true
+
+  # NAT tools are executed by the agent graph
+  nat_tools: [current_datetime]
+
+  # Built-in tools are bound to the LLM (for example, Code Interpreter)
+  builtin_tools:
+    - type: code_interpreter
+      container:
+        type: "auto"
+
+  # Optional: Remote tools via Model Context Protocol
+  mcp_tools:
+    - type: mcp
+      server_label: deepwiki
+      server_url: https://mcp.deepwiki.com/mcp
+      allowed_tools: [read_wiki_structure, read_wiki_contents]
+      require_approval: never
+```
+
+## Configurable Options
+
+- `llm_name`: The LLM to use. Must refer to an entry under `llms`.
+- `verbose`: Defaults to `false`. When `true`, the agent logs input, output, and intermediate steps.
+- `handle_tool_errors`: Defaults to `true`. When enabled, tool errors are returned to the model (instead of raising) so it can recover.
+- `nat_tools`: A list of toolkit tools (by function ref) that run in the agent graph.
+- `builtin_tools`: A list of built-in tools to bind on the LLM. Availability depends on the selected model.
+- `mcp_tools`: A list of MCP tool descriptors bound on the LLM, with fields `server_label`, `server_url`, `allowed_tools`, and `require_approval`.
+- `max_iterations`: Defaults to `15`. Maximum number of tool invocations the agent may perform.
+- `description`: Defaults to `Agent Workflow`. Used when the workflow is exported as a function.
+- `parallel_tool_calls`: Defaults to `false`. If supported, allows the model runtime to schedule multiple tool calls in parallel.
+
+## Running the Agent
+
+Run from the repository root with a sample prompt:
+
+```bash
+nat run --config_file=examples/agents/tool_calling/configs/config-responses-api.yml --input "How many 0s are in the current time?"
+```
+
+## MCP Field Reference
+
+When adding entries to `mcp_tools`, each object supports the following fields:
+
+- `type`: Must be `mcp`.
+- `server_label`: Short label for the server.
+- `server_url`: URL of the MCP endpoint.
+- `allowed_tools`: Optional allowlist of tool names the model may call.
+- `require_approval`: One of `never`, `always`, or `auto`.
+- `headers`: Optional map of HTTP headers to include when calling the server.
@@ -35,6 +35,7 @@ A configurable Tool Calling agent. This agent leverages the NeMo Agent toolkit p
   - [Starting the NeMo Agent Toolkit Server](#starting-the-nemo-agent-toolkit-server)
   - [Making Requests to the NeMo Agent Toolkit Server](#making-requests-to-the-nemo-agent-toolkit-server)
   - [Evaluating the Tool Calling Agent Workflow](#evaluating-the-tool-calling-agent-workflow)
+- [Using Tool Calling with the OpenAI Responses API](#using-tool-calling-with-the-openai-responses-api)
 
 ## Key Features
 
@@ -177,3 +178,105 @@ curl --request POST \
 ```bash
 nat eval --config_file=examples/agents/tool_calling/configs/config.yml
 ```
+
+### Using Tool Calling with the OpenAI Responses API
+The NeMo Agent toolkit also provides an agent implementation that uses OpenAI's Responses API to enable built-in tools (such as Code Interpreter) and remote tools via Model Context Protocol (MCP).
+
+#### What is the Responses API?
+OpenAI's Responses API is a unified endpoint for reasoning models that supports built-in tools and external tool integrations. Compared to Chat Completions, Responses focuses on agentic behaviors like multi-step tool use, background tasks, and streaming of intermediate items. With Responses, models can:
+- Use built-in tools such as Code Interpreter; some models also support file search and image generation.
+- Connect to remote tools exposed over the Model Context Protocol (MCP).
+
+For current capabilities and model support, see OpenAI's documentation for the Responses API.
+
+#### Prerequisites
+- Set your OpenAI API key in the environment:
+  ```bash
+  export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
+  ```
+- Use an OpenAI model that supports the Responses API and the tools you enable (for example, `o3`, `o4-mini`, or a compatible `gpt-*` model that advertises Responses support).
+
+#### Run the Responses API agent
+An example configuration is provided at `examples/agents/tool_calling/configs/config-responses-api.yml`. Run it from the NeMo Agent toolkit repo root:
+
+```bash
+nat run --config_file=examples/agents/tool_calling/configs/config-responses-api.yml --input "How many 0s are in the current time?"
+```
+
+#### Configure the agent for Responses
+Key fields in `config-responses-api.yml`:
+
+```yaml
+llms:
+  openai_llm:
+    _type: openai
+    model_name: gpt-5-mini
+    # Setting the `api_type` to responses uses the Responses API
+    api_type: responses
+
+workflow:
+  _type: responses_api_agent
+  llm_name: openai_llm
+  verbose: true
+  handle_tool_errors: true
+  # Tools exposed to the agent:
+  nat_tools: [current_datetime]     # NAT tools executed by the agent graph
+  builtin_tools:                    # Built-in OpenAI tools bound directly to the LLM
+    - type: code_interpreter
+      container:
+        type: "auto"
+  mcp_tools: []                     # Optional: remote tools over MCP (see below)
+```
+
+- **`nat_tools`**: Tools implemented in NeMo Agent toolkit (for example, `current_datetime`). These run via the tool node in the agent graph.
+- **`builtin_tools`**: Tools provided by OpenAI's Responses API and executed by the model runtime. The agent binds them to the LLM; the graph does not run them directly.
+- **`mcp_tools`**: Remote tools exposed via MCP. The agent passes the schema to the LLM; the model orchestrates calls to the remote server.
+
+#### Built-in tools for OpenAI models
+Built-in tool availability depends on model and account features. Common built-ins include:
+- **Code Interpreter**: Execute Python for data analysis, math, and code execution. In this repo, configure it as:
+  ```yaml
+  builtin_tools:
+    - type: code_interpreter
+      container:
+        type: "auto"
+  ```
+- **File search** and **image generation** may be supported by some models in Responses. Refer to OpenAI docs for the latest tool names and required parameters if you choose to add them to `builtin_tools`.
+
+Notes:
+- This agent enforces that the selected LLM uses the Responses API.
+- When `builtin_tools` or `mcp_tools` are provided, they are bound on the LLM with `strict=True` and optional `parallel_tool_calls` support.
+
+#### Configure MCP tools
+You can allow the model to call tools from a remote MCP server by adding entries under `mcp_tools`. The schema is defined in `src/nat/data_models/openai_mcp.py`.
+
+Example:
+
+```yaml
+workflow:
+  _type: responses_api_agent
+  llm_name: openai_llm
+  # ...
+  mcp_tools:
+    - type: mcp
+      server_label: deepwiki
+      server_url: https://mcp.deepwiki.com/mcp
+      allowed_tools: [read_wiki_structure, read_wiki_contents]
+      require_approval: never   # one of: never, always, auto
+      headers:
+        Authorization: Bearer <TOKEN_IF_REQUIRED>
+```
+
+Field reference (MCP):
+- **type**: Must be `mcp`.
+- **`server_label`**: A short label for the server. Used in model outputs and logs.
+- **`server_url`**: The MCP server endpoint URL.
+- **`allowed_tools`**: Optional allowlist of tool names the model may call. Omit or set empty to allow all server tools.
+- **`require_approval`**: `never`, `always`, or `auto` (defaults to `never`). Controls whether tool invocations require approval.
+- **headers**: Optional HTTP headers to include on MCP requests.
+
+#### Tips and troubleshooting
+- Ensure your model supports the specific built-in tools you enable.
+- Some built-ins (for example, file search) may require separate setup in your OpenAI account (vector stores, file uploads). Consult OpenAI documentation for current requirements.
+- If tool calls error and `handle_tool_errors` is `true`, the agent will surface an informative message instead of raising.
+
@@ -0,0 +1,39 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+
+general:
+  use_uvloop: true
+
+llms:
+  openai_llm:
+    _type: openai
+    model_name: gpt-5-mini-2025-08-07
+    api_type: responses
+
+functions:
+  current_datetime:
+    _type: current_datetime
+
+workflow:
+  _type: responses_api_agent
+  nat_tools: [current_datetime]
+  builtin_tools:
+    - type: code_interpreter
+      container:
+        type: "auto"
+  llm_name: openai_llm
+  verbose: true
+  handle_tool_errors: true
@@ -13,7 +13,7 @@ dependencies = [
   "nvidia-nat[langchain,llama-index,openai,nvidia_haystack]~=1.4",
   "arxiv~=2.1.3",
   "bs4==0.0.2",
-  "markdown-it-py~=3.0",
+  "markdown-it-py~=3.0"
 ]
 requires-python = ">=3.11,<3.14"
 description = "Custom NeMo Agent toolkit Workflow"

@@ -18,6 +18,7 @@
 from nat.builder.builder import Builder
 from nat.builder.framework_enum import LLMFrameworkEnum
 from nat.cli.register_workflow import register_llm_client
+from nat.data_models.llm import APITypeEnum
 from nat.data_models.llm import LLMBaseConfig
 from nat.data_models.retry_mixin import RetryMixin
 from nat.data_models.thinking_mixin import ThinkingMixin
@@ -28,6 +29,7 @@
 from nat.llm.utils.thinking import FunctionArgumentWrapper
 from nat.llm.utils.thinking import patch_with_thinking
 from nat.utils.exception_handlers.automatic_retries import patch_with_retry
+from nat.utils.responses_api import validate_no_responses_api
 from nat.utils.type_utils import override
 
 ModelType = TypeVar("ModelType")
@@ -80,6 +82,8 @@ async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
 
     from agno.models.nvidia import Nvidia
 
+    validate_no_responses_api(llm_config)
+
     config_obj = {
         **llm_config.model_dump(
             exclude={"type", "model_name", "thinking"},
@@ -97,6 +101,7 @@ async def nim_agno(llm_config: NIMModelConfig, _builder: Builder):
 async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
 
     from agno.models.openai import OpenAIChat
+    from agno.models.openai import OpenAIResponses
 
     config_obj = {
         **llm_config.model_dump(
@@ -106,7 +111,10 @@ async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder):
         ),
     }
 
-    client = OpenAIChat(**config_obj, id=llm_config.model_name)
+    if llm_config.api_type == APITypeEnum.RESPONSES:
+        client = OpenAIResponses(**config_obj, id=llm_config.model_name)
+    else:
+        client = OpenAIChat(**config_obj, id=llm_config.model_name)
 
     yield _patch_llm_based_on_config(client, llm_config)
 
@@ -116,6 +124,8 @@ async def litellm_agno(llm_config: LiteLlmModelConfig, _builder: Builder):
 
     from agno.models.litellm.chat import LiteLLM
 
+    validate_no_responses_api(llm_config)
+
     client = LiteLLM(
         **llm_config.model_dump(
             exclude={"type", "thinking", "model_name"},