diff --git a/docs/source/workflows/about/index.md b/docs/source/workflows/about/index.md index eee27cb95..1035de4f2 100644 --- a/docs/source/workflows/about/index.md +++ b/docs/source/workflows/about/index.md @@ -69,6 +69,7 @@ For details on workflow configuration, including sections not utilized in the ab ReAct Agent <./react-agent.md> Reasoning Agent <./reasoning-agent.md> ReWOO Agent <./rewoo-agent.md> +Responses API and Agent <./responses-api-and-agent.md> Router Agent <./router-agent.md> Sequential Executor <./sequential-executor.md> Tool Calling Agent <./tool-calling-agent.md> diff --git a/docs/source/workflows/about/responses-api-and-agent.md b/docs/source/workflows/about/responses-api-and-agent.md new file mode 100644 index 000000000..7ec646209 --- /dev/null +++ b/docs/source/workflows/about/responses-api-and-agent.md @@ -0,0 +1,131 @@ + + +# Responses API and Agent + +The NeMo Agent toolkit supports OpenAI's Responses API through two complementary pieces: + +1) LLM client configuration via the `api_type` field, and 2) a dedicated workflow agent `_type: responses_api_agent` designed for tool use with the Responses API. + +Unlike standard chat-based integrations, the Responses API enables models to use built-in tools (for example, Code Interpreter) and connect to remote tools using the Model Context Protocol (MCP). This page explains how to configure an LLM for Responses and how to use the dedicated agent. + + +## Features + +- **LLM Client Switch**: Select the LLM client mode using `api_type`. +- **Built-in Tools**: Bind Responses built-ins such as Code Interpreter via `builtin_tools`. +- **MCP Tools**: Connect remote tools using `mcp_tools` with fields like `server_label` and `server_url`. +- **NAT Tools**: Continue to use toolkit tools through `nat_tools` (executed by the agent graph). +- **Agentic Workflow**: The `_type: responses_api_agent` integrates tool binding with the NeMo Agent dual-node graph. + + +## Requirements + +- A model that supports the Responses API and any enabled built-in tools. +- For MCP usage, a reachable MCP server and any necessary credentials. + + +## LLM Configuration: `api_type` + +LLM clients support an `api_type` selector. By default, `api_type` is `chat_completions`. To use the Responses API, set `api_type` to `responses` in your LLM configuration. + +### Example + +```yaml +llms: + openai_llm: + _type: openai + model_name: gpt-5-mini-2025-08-07 + # Default is `chat_completions`; set to `responses` to enable the Responses API + api_type: responses +``` + +Notes: +- If `api_type` is omitted, the client uses `chat_completions`. +- The Responses API unlocks built-in tools and MCP integration. + +## Agent Configuration: `_type: responses_api_agent` + +The Responses API agent binds tools directly to the LLM for execution under the Responses API, while NAT tools run via the agent graph. This preserves the familiar flow of the NeMo Agent toolkit with added tool capabilities. + +### Example `config.yml` + +```yaml +functions: + current_datetime: + _type: current_datetime + +llms: + openai_llm: + _type: openai + model_name: gpt-5-mini-2025-08-07 + api_type: responses + +workflow: + _type: responses_api_agent + llm_name: openai_llm + verbose: true + handle_tool_errors: true + + # NAT tools are executed by the agent graph + nat_tools: [current_datetime] + + # Built-in tools are bound to the LLM (for example, Code Interpreter) + builtin_tools: + - type: code_interpreter + container: + type: "auto" + + # Optional: Remote tools via Model Context Protocol + mcp_tools: + - type: mcp + server_label: deepwiki + server_url: https://mcp.deepwiki.com/mcp + allowed_tools: [read_wiki_structure, read_wiki_contents] + require_approval: never +``` + +## Configurable Options + +- `llm_name`: The LLM to use. Must refer to an entry under `llms`. +- `verbose`: Defaults to `false`. When `true`, the agent logs input, output, and intermediate steps. +- `handle_tool_errors`: Defaults to `true`. When enabled, tool errors are returned to the model (instead of raising) so it can recover. +- `nat_tools`: A list of toolkit tools (by function ref) that run in the agent graph. +- `builtin_tools`: A list of built-in tools to bind on the LLM. Availability depends on the selected model. +- `mcp_tools`: A list of MCP tool descriptors bound on the LLM, with fields `server_label`, `server_url`, `allowed_tools`, and `require_approval`. +- `max_iterations`: Defaults to `15`. Maximum number of tool invocations the agent may perform. +- `description`: Defaults to `Agent Workflow`. Used when the workflow is exported as a function. +- `parallel_tool_calls`: Defaults to `false`. If supported, allows the model runtime to schedule multiple tool calls in parallel. + +## Running the Agent + +Run from the repository root with a sample prompt: + +```bash +nat run --config_file=examples/agents/tool_calling/configs/config-responses-api.yml --input "How many 0s are in the current time?" +``` + +## MCP Field Reference + +When adding entries to `mcp_tools`, each object supports the following fields: + +- `type`: Must be `mcp`. +- `server_label`: Short label for the server. +- `server_url`: URL of the MCP endpoint. +- `allowed_tools`: Optional allowlist of tool names the model may call. +- `require_approval`: One of `never`, `always`, or `auto`. +- `headers`: Optional map of HTTP headers to include when calling the server. diff --git a/examples/agents/tool_calling/README.md b/examples/agents/tool_calling/README.md index 20fd00ea8..1e4ed9255 100644 --- a/examples/agents/tool_calling/README.md +++ b/examples/agents/tool_calling/README.md @@ -35,6 +35,7 @@ A configurable Tool Calling agent. This agent leverages the NeMo Agent toolkit p - [Starting the NeMo Agent Toolkit Server](#starting-the-nemo-agent-toolkit-server) - [Making Requests to the NeMo Agent Toolkit Server](#making-requests-to-the-nemo-agent-toolkit-server) - [Evaluating the Tool Calling Agent Workflow](#evaluating-the-tool-calling-agent-workflow) +- [Using Tool Calling with the OpenAI Responses API](#using-tool-calling-with-the-openai-responses-api) ## Key Features @@ -86,6 +87,13 @@ If you have not already done so, follow the [Obtaining API Keys](../../../docs/s ```bash export NVIDIA_API_KEY= ``` + +If you will be using the Responses API, also export your model's API key as the `OPENAI_API_KEY` as shown below. + +```bash +export OPENAI_API_KEY= +``` + --- ## Run the Workflow @@ -177,3 +185,98 @@ curl --request POST \ ```bash nat eval --config_file=examples/agents/tool_calling/configs/config.yml ``` + +### Using Tool Calling with the OpenAI Responses API +The NeMo Agent toolkit also provides an agent implementation that uses OpenAI's Responses API to enable built-in tools (such as Code Interpreter) and remote tools via Model Context Protocol (MCP). + +#### What is the Responses API? +OpenAI's Responses API is a unified endpoint for reasoning models that supports built-in tools and external tool integrations. Compared to Chat Completions, Responses focuses on agentic behaviors like multi-step tool use, background tasks, and streaming of intermediate items. With Responses, models can: +- Use built-in tools such as Code Interpreter; some models also support file search and image generation. +- Connect to remote tools exposed over the Model Context Protocol (MCP). + +For current capabilities and model support, see OpenAI's documentation for the Responses API. + +#### Run the Responses API agent +An example configuration is provided at `examples/agents/tool_calling/configs/config-responses-api.yml`. Run it from the NeMo Agent toolkit repo root: + +```bash +nat run --config_file=examples/agents/tool_calling/configs/config-responses-api.yml --input "How many 0s are in the current time?" +``` + +#### Configure the agent for Responses +Key fields in `config-responses-api.yml`: + +```yaml +llms: + openai_llm: + _type: openai + model_name: gpt-5-mini + # Setting the `api_type` to responses uses the Responses API + api_type: responses + +workflow: + _type: responses_api_agent + llm_name: openai_llm + verbose: true + handle_tool_errors: true + # Tools exposed to the agent: + nat_tools: [current_datetime] # NAT tools executed by the agent graph + builtin_tools: # Built-in OpenAI tools bound directly to the LLM + - type: code_interpreter + container: + type: "auto" + mcp_tools: [] # Optional: remote tools over MCP (see below) +``` + +- **`nat_tools`**: Tools implemented in NeMo Agent toolkit (for example, `current_datetime`). These run via the tool node in the agent graph. +- **`builtin_tools`**: Tools provided by OpenAI's Responses API and executed by the model runtime. The agent binds them to the LLM; the graph does not run them directly. +- **`mcp_tools`**: Remote tools exposed via MCP. The agent passes the schema to the LLM; the model orchestrates calls to the remote server. + +#### Built-in tools for OpenAI models +Built-in tool availability depends on model and account features. Common built-ins include: +- **Code Interpreter**: Execute Python for data analysis, math, and code execution. In this repo, configure it as: + ```yaml + builtin_tools: + - type: code_interpreter + container: + type: "auto" + ``` +- **File search** and **image generation** may be supported by some models in Responses. Refer to OpenAI docs for the latest tool names and required parameters if you choose to add them to `builtin_tools`. + +Notes: +- This agent enforces that the selected LLM uses the Responses API. +- When `builtin_tools` or `mcp_tools` are provided, they are bound on the LLM with `strict=True` and optional `parallel_tool_calls` support. + +#### Configure MCP tools +You can allow the model to call tools from a remote MCP server by adding entries under `mcp_tools`. The schema is defined in `src/nat/data_models/openai_mcp.py`. + +Example: + +```yaml +workflow: + _type: responses_api_agent + llm_name: openai_llm + # ... + mcp_tools: + - type: mcp + server_label: deepwiki + server_url: https://mcp.deepwiki.com/mcp + allowed_tools: [read_wiki_structure, read_wiki_contents] + require_approval: never # one of: never, always, auto + headers: + Authorization: Bearer +``` + +Field reference (MCP): +- **type**: Must be `mcp`. +- **`server_label`**: A short label for the server. Used in model outputs and logs. +- **`server_url`**: The MCP server endpoint URL. +- **`allowed_tools`**: Optional allowlist of tool names the model may call. Omit or set empty to allow all server tools. +- **`require_approval`**: `never`, `always`, or `auto` (defaults to `never`). Controls whether tool invocations require approval. +- **headers**: Optional HTTP headers to include on MCP requests. + +#### Tips and troubleshooting +- Ensure your model supports the specific built-in tools you enable. +- Some built-ins (for example, file search) may require separate setup in your OpenAI account (vector stores, file uploads). Consult OpenAI documentation for current requirements. +- If tool calls error and `handle_tool_errors` is `true`, the agent will surface an informative message instead of raising. + diff --git a/examples/agents/tool_calling/configs/config-responses-api.yml b/examples/agents/tool_calling/configs/config-responses-api.yml new file mode 100644 index 000000000..b87fd2722 --- /dev/null +++ b/examples/agents/tool_calling/configs/config-responses-api.yml @@ -0,0 +1,39 @@ +# SPDX-FileCopyrightText: Copyright (c) 2024-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +general: + use_uvloop: true + +llms: + openai_llm: + _type: openai + model_name: gpt-5-mini-2025-08-07 + api_type: responses + +functions: + current_datetime: + _type: current_datetime + +workflow: + _type: responses_api_agent + nat_tools: [current_datetime] + builtin_tools: + - type: code_interpreter + container: + type: "auto" + llm_name: openai_llm + verbose: true + handle_tool_errors: true diff --git a/examples/frameworks/multi_frameworks/pyproject.toml b/examples/frameworks/multi_frameworks/pyproject.toml index 9ba717fa4..2c83d104a 100644 --- a/examples/frameworks/multi_frameworks/pyproject.toml +++ b/examples/frameworks/multi_frameworks/pyproject.toml @@ -13,7 +13,7 @@ dependencies = [ "nvidia-nat[langchain,llama-index,openai,nvidia_haystack]~=1.4", "arxiv~=2.1.3", "bs4==0.0.2", - "markdown-it-py~=3.0", + "markdown-it-py~=3.0" ] requires-python = ">=3.11,<3.14" description = "Custom NeMo Agent toolkit Workflow" diff --git a/packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py b/packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py index fa2c44857..b8ab16453 100644 --- a/packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py +++ b/packages/nvidia_nat_adk/src/nat/plugins/adk/llm.py @@ -23,6 +23,7 @@ from nat.llm.litellm_llm import LiteLlmModelConfig from nat.llm.nim_llm import NIMModelConfig from nat.llm.openai_llm import OpenAIModelConfig +from nat.utils.responses_api import validate_no_responses_api logger = logging.getLogger(__name__) @@ -37,8 +38,12 @@ async def azure_openai_adk(config: AzureOpenAIModelConfig, _builder: Builder): """ from google.adk.models.lite_llm import LiteLlm + validate_no_responses_api(config, LLMFrameworkEnum.ADK) + config_dict = config.model_dump( - exclude={"type", "max_retries", "thinking", "azure_endpoint", "azure_deployment", "model_name", "model"}, + exclude={ + "type", "max_retries", "thinking", "azure_endpoint", "azure_deployment", "model_name", "model", "api_type" + }, by_alias=True, exclude_none=True, ) @@ -51,8 +56,11 @@ async def azure_openai_adk(config: AzureOpenAIModelConfig, _builder: Builder): @register_llm_client(config_type=LiteLlmModelConfig, wrapper_type=LLMFrameworkEnum.ADK) async def litellm_adk(litellm_config: LiteLlmModelConfig, _builder: Builder): from google.adk.models.lite_llm import LiteLlm + + validate_no_responses_api(litellm_config, LLMFrameworkEnum.ADK) + yield LiteLlm(**litellm_config.model_dump( - exclude={"type", "max_retries", "thinking"}, + exclude={"type", "max_retries", "thinking", "api_type"}, by_alias=True, exclude_none=True, )) @@ -69,6 +77,8 @@ async def nim_adk(config: NIMModelConfig, _builder: Builder): import litellm from google.adk.models.lite_llm import LiteLlm + validate_no_responses_api(config, LLMFrameworkEnum.ADK) + logger.warning("NIMs do not currently support tools with ADK. Tools will be ignored.") litellm.add_function_to_prompt = True litellm.drop_params = True @@ -77,7 +87,7 @@ async def nim_adk(config: NIMModelConfig, _builder: Builder): os.environ["NVIDIA_NIM_API_KEY"] = api_key config_dict = config.model_dump( - exclude={"type", "max_retries", "thinking", "model_name", "model", "base_url"}, + exclude={"type", "max_retries", "thinking", "model_name", "model", "base_url", "api_type"}, by_alias=True, exclude_none=True, ) @@ -97,8 +107,10 @@ async def openai_adk(config: OpenAIModelConfig, _builder: Builder): """ from google.adk.models.lite_llm import LiteLlm + validate_no_responses_api(config, LLMFrameworkEnum.ADK) + config_dict = config.model_dump( - exclude={"type", "max_retries", "thinking", "model_name", "model", "base_url"}, + exclude={"type", "max_retries", "thinking", "model_name", "model", "base_url", "api_type"}, by_alias=True, exclude_none=True, ) diff --git a/packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py b/packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py index 1f513aed0..9987168c7 100644 --- a/packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py +++ b/packages/nvidia_nat_agno/src/nat/plugins/agno/llm.py @@ -18,6 +18,7 @@ from nat.builder.builder import Builder from nat.builder.framework_enum import LLMFrameworkEnum from nat.cli.register_workflow import register_llm_client +from nat.data_models.llm import APITypeEnum from nat.data_models.llm import LLMBaseConfig from nat.data_models.retry_mixin import RetryMixin from nat.data_models.thinking_mixin import ThinkingMixin @@ -28,6 +29,7 @@ from nat.llm.utils.thinking import FunctionArgumentWrapper from nat.llm.utils.thinking import patch_with_thinking from nat.utils.exception_handlers.automatic_retries import patch_with_retry +from nat.utils.responses_api import validate_no_responses_api from nat.utils.type_utils import override ModelType = TypeVar("ModelType") @@ -80,9 +82,11 @@ async def nim_agno(llm_config: NIMModelConfig, _builder: Builder): from agno.models.nvidia import Nvidia + validate_no_responses_api(llm_config, LLMFrameworkEnum.AGNO) + config_obj = { **llm_config.model_dump( - exclude={"type", "model_name", "thinking"}, + exclude={"type", "model_name", "thinking", "api_type"}, by_alias=True, exclude_none=True, ), @@ -97,16 +101,20 @@ async def nim_agno(llm_config: NIMModelConfig, _builder: Builder): async def openai_agno(llm_config: OpenAIModelConfig, _builder: Builder): from agno.models.openai import OpenAIChat + from agno.models.openai import OpenAIResponses config_obj = { **llm_config.model_dump( - exclude={"type", "model_name", "thinking"}, + exclude={"type", "model_name", "thinking", "api_type"}, by_alias=True, exclude_none=True, ), } - client = OpenAIChat(**config_obj, id=llm_config.model_name) + if llm_config.api_type == APITypeEnum.RESPONSES: + client = OpenAIResponses(**config_obj, id=llm_config.model_name) + else: + client = OpenAIChat(**config_obj, id=llm_config.model_name) yield _patch_llm_based_on_config(client, llm_config) @@ -116,9 +124,11 @@ async def litellm_agno(llm_config: LiteLlmModelConfig, _builder: Builder): from agno.models.litellm.chat import LiteLLM + validate_no_responses_api(llm_config, LLMFrameworkEnum.AGNO) + client = LiteLLM( **llm_config.model_dump( - exclude={"type", "thinking", "model_name"}, + exclude={"type", "thinking", "model_name", "api_type"}, by_alias=True, exclude_none=True, ), diff --git a/packages/nvidia_nat_agno/tests/test_llm.py b/packages/nvidia_nat_agno/tests/test_llm_agno.py similarity index 83% rename from packages/nvidia_nat_agno/tests/test_llm.py rename to packages/nvidia_nat_agno/tests/test_llm_agno.py index fb690a8c7..5a8f45916 100644 --- a/packages/nvidia_nat_agno/tests/test_llm.py +++ b/packages/nvidia_nat_agno/tests/test_llm_agno.py @@ -21,6 +21,7 @@ from nat.builder.builder import Builder from nat.builder.framework_enum import LLMFrameworkEnum +from nat.data_models.llm import APITypeEnum from nat.llm.nim_llm import NIMModelConfig from nat.llm.openai_llm import OpenAIModelConfig from nat.plugins.agno.llm import nim_agno @@ -40,6 +41,11 @@ def nim_config(self): """Create a NIMModelConfig instance.""" return NIMModelConfig(model_name="test-model") + @pytest.fixture + def nim_config_responses(self): + """Create a NIMModelConfig instance.""" + return NIMModelConfig(model_name="test-model", api_type=APITypeEnum.RESPONSES) + @patch("agno.models.nvidia.Nvidia") async def test_nim_agno_basic(self, mock_nvidia, nim_config, mock_builder): """Test that nim_agno creates a Nvidia instance with the correct parameters.""" @@ -53,6 +59,17 @@ async def test_nim_agno_basic(self, mock_nvidia, nim_config, mock_builder): # Verify that the returned object is the mock Nvidia instance assert nvidia_instance == mock_nvidia.return_value + @patch("agno.models.nvidia.Nvidia") + async def test_nim_agno_responses(self, mock_nvidia, nim_config_responses, mock_builder): + """Test that nim_agno raises ValueError for NIMModelConfig with Responses API.""" + # Use the context manager properly + with pytest.raises(ValueError, match="Responses API is not supported"): + async with nim_agno(nim_config_responses, mock_builder): + pass + + # Verify that Nvidia was not created + mock_nvidia.assert_not_called() + @patch("agno.models.nvidia.Nvidia") async def test_nim_agno_with_base_url(self, mock_nvidia, nim_config, mock_builder): """Test that nim_agno creates a Nvidia instance with base_url when provided.""" @@ -105,7 +122,7 @@ async def test_nim_agno_with_existing_env_var(self, mock_nvidia, nim_config, moc mock_nvidia.assert_called_once() # Verify that the returned object is the mock Nvidia instance - assert nvidia_instance == mock_nvidia.return_value + assert nvidia_instance == mock_nvidia.return_value @patch("agno.models.nvidia.Nvidia") async def test_nim_agno_without_api_key(self, mock_nvidia, nim_config, mock_builder): @@ -133,6 +150,11 @@ def openai_config(self): """Create an OpenAIModelConfig instance.""" return OpenAIModelConfig(model_name="gpt-4") + @pytest.fixture + def openai_responses_config(self): + """Create an OpenAIModelConfig instance for responses.""" + return OpenAIModelConfig(model_name="gpt-4", api_type=APITypeEnum.RESPONSES) + @patch("agno.models.openai.OpenAIChat") async def test_openai_agno(self, mock_openai_chat, openai_config, mock_builder): """Test that openai_agno creates an OpenAIChat instance with the correct parameters.""" @@ -148,6 +170,21 @@ async def test_openai_agno(self, mock_openai_chat, openai_config, mock_builder): # Verify that the returned object is the mock OpenAIChat instance assert openai_instance == mock_openai_chat.return_value + @patch("agno.models.openai.OpenAIResponses") + async def test_openai_agno_responses(self, mock_openai_responses, openai_responses_config, mock_builder): + """Test that openai_agno creates an OpenAIResponses instance with the correct parameters.""" + # Use the context manager properly + async with openai_agno(openai_responses_config, mock_builder) as openai_instance: + # Verify that OpenAIResponses was created with the correct parameters + mock_openai_responses.assert_called_once() + call_kwargs = mock_openai_responses.call_args[1] + + # Check that model is set correctly + assert call_kwargs["id"] == "gpt-4" + + # Verify that the returned object is the mock OpenAIResponses instance + assert openai_instance == mock_openai_responses.return_value + @patch("agno.models.openai.OpenAIChat") async def test_openai_agno_with_additional_params(self, mock_openai_chat, openai_config, mock_builder): """Test that openai_agno passes additional params to OpenAIChat.""" diff --git a/packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py b/packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py index 54106e870..c26baa491 100644 --- a/packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py +++ b/packages/nvidia_nat_crewai/src/nat/plugins/crewai/llm.py @@ -30,6 +30,7 @@ from nat.llm.utils.thinking import FunctionArgumentWrapper from nat.llm.utils.thinking import patch_with_thinking from nat.utils.exception_handlers.automatic_retries import patch_with_retry +from nat.utils.responses_api import validate_no_responses_api from nat.utils.type_utils import override ModelType = TypeVar("ModelType") @@ -74,6 +75,8 @@ async def azure_openai_crewai(llm_config: AzureOpenAIModelConfig, _builder: Buil from crewai import LLM + validate_no_responses_api(llm_config, LLMFrameworkEnum.CREWAI) + # https://docs.crewai.com/en/concepts/llms#azure api_key = llm_config.api_key or os.environ.get("AZURE_OPENAI_API_KEY") or os.environ.get("AZURE_API_KEY") @@ -93,13 +96,7 @@ async def azure_openai_crewai(llm_config: AzureOpenAIModelConfig, _builder: Buil client = LLM( **llm_config.model_dump( - exclude={ - "type", - "api_key", - "azure_endpoint", - "azure_deployment", - "thinking", - }, + exclude={"type", "api_key", "azure_endpoint", "azure_deployment", "thinking", "api_type"}, by_alias=True, exclude_none=True, ), @@ -114,6 +111,8 @@ async def nim_crewai(llm_config: NIMModelConfig, _builder: Builder): from crewai import LLM + validate_no_responses_api(llm_config, LLMFrameworkEnum.CREWAI) + # Because CrewAI uses a different environment variable for the API key, we need to set it here manually if llm_config.api_key is None and "NVIDIA_NIM_API_KEY" not in os.environ: nvidia_api_key = os.getenv("NVIDIA_API_KEY") @@ -121,7 +120,9 @@ async def nim_crewai(llm_config: NIMModelConfig, _builder: Builder): os.environ["NVIDIA_NIM_API_KEY"] = nvidia_api_key client = LLM( - **llm_config.model_dump(exclude={"type", "model_name", "thinking"}, by_alias=True, exclude_none=True), + **llm_config.model_dump(exclude={"type", "model_name", "thinking", "api_type"}, + by_alias=True, + exclude_none=True), model=f"nvidia_nim/{llm_config.model_name}", ) @@ -133,7 +134,9 @@ async def openai_crewai(llm_config: OpenAIModelConfig, _builder: Builder): from crewai import LLM - client = LLM(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True, exclude_none=True)) + validate_no_responses_api(llm_config, LLMFrameworkEnum.CREWAI) + + client = LLM(**llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) yield _patch_llm_based_on_config(client, llm_config) @@ -143,6 +146,8 @@ async def litellm_crewai(llm_config: LiteLlmModelConfig, _builder: Builder): from crewai import LLM - client = LLM(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True, exclude_none=True)) + validate_no_responses_api(llm_config, LLMFrameworkEnum.CREWAI) + + client = LLM(**llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) yield _patch_llm_based_on_config(client, llm_config) diff --git a/packages/nvidia_nat_crewai/tests/test_llm_crewai.py b/packages/nvidia_nat_crewai/tests/test_llm_crewai.py new file mode 100644 index 000000000..51706b1ad --- /dev/null +++ b/packages/nvidia_nat_crewai/tests/test_llm_crewai.py @@ -0,0 +1,147 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=unused-argument, not-async-context-manager + +import os +from unittest.mock import MagicMock +from unittest.mock import patch + +import pytest + +from nat.builder.builder import Builder +from nat.builder.framework_enum import LLMFrameworkEnum +from nat.data_models.llm import APITypeEnum +from nat.llm.nim_llm import NIMModelConfig +from nat.llm.openai_llm import OpenAIModelConfig +from nat.plugins.crewai.llm import nim_crewai +from nat.plugins.crewai.llm import openai_crewai + +# --------------------------------------------------------------------------- +# NIM → CrewAI wrapper tests +# --------------------------------------------------------------------------- + + +class TestNimCrewAI: + """Tests for the nim_crewai wrapper.""" + + @pytest.fixture + def mock_builder(self) -> Builder: + return MagicMock(spec=Builder) + + @pytest.fixture + def nim_cfg(self): + return NIMModelConfig(model_name="test-nim") + + @pytest.fixture + def nim_cfg_responses(self): + return NIMModelConfig(model_name="test-nim", api_type=APITypeEnum.RESPONSES) + + @patch("crewai.LLM") + async def test_basic_creation(self, mock_llm, nim_cfg, mock_builder): + """Wrapper should yield a crewai.LLM configured for the NIM model.""" + async with nim_crewai(nim_cfg, mock_builder) as llm_obj: + mock_llm.assert_called_once() + kwargs = mock_llm.call_args.kwargs + assert kwargs["model"] == "nvidia_nim/test-nim" + assert llm_obj is mock_llm.return_value + + @patch("crewai.LLM") + async def test_responses_api_blocked(self, mock_llm, nim_cfg_responses, mock_builder): + """Selecting the Responses API must raise a ValueError.""" + with pytest.raises(ValueError, match="Responses API is not supported"): + async with nim_crewai(nim_cfg_responses, mock_builder): + pass + mock_llm.assert_not_called() + + @patch("crewai.LLM") + @patch.dict(os.environ, {"NVIDIA_API_KEY": "legacy-key"}, clear=True) + async def test_env_key_transfer(self, mock_llm, nim_cfg, mock_builder): + """ + If NVIDIA_NIM_API_KEY is not set but NVIDIA_API_KEY is, + the wrapper should copy it for LiteLLM compatibility. + """ + assert "NVIDIA_NIM_API_KEY" not in os.environ + async with nim_crewai(nim_cfg, mock_builder): + pass + assert os.environ["NVIDIA_NIM_API_KEY"] == "legacy-key" + mock_llm.assert_called_once() + + +# --------------------------------------------------------------------------- +# OpenAI → CrewAI wrapper tests +# --------------------------------------------------------------------------- + + +class TestOpenAICrewAI: + """Tests for the openai_crewai wrapper.""" + + @pytest.fixture + def mock_builder(self) -> Builder: + return MagicMock(spec=Builder) + + @pytest.fixture + def openai_cfg(self): + return OpenAIModelConfig(model_name="gpt-4o") + + @pytest.fixture + def openai_cfg_responses(self): + return OpenAIModelConfig(model_name="gpt-4o", api_type=APITypeEnum.RESPONSES) + + @patch("crewai.LLM") + async def test_basic_creation(self, mock_llm, openai_cfg, mock_builder): + """Wrapper should yield a crewai.LLM for OpenAI models.""" + async with openai_crewai(openai_cfg, mock_builder) as llm_obj: + mock_llm.assert_called_once() + assert mock_llm.call_args.kwargs["model"] == "gpt-4o" + assert llm_obj is mock_llm.return_value + + @patch("crewai.LLM") + async def test_param_passthrough(self, mock_llm, openai_cfg, mock_builder): + """Arbitrary config kwargs must reach crewai.LLM unchanged.""" + openai_cfg.temperature = 0.3 + openai_cfg.api_key = "sk-abc123" + async with openai_crewai(openai_cfg, mock_builder): + pass + kwargs = mock_llm.call_args.kwargs + assert kwargs["temperature"] == 0.3 + assert kwargs["api_key"] == "sk-abc123" + + @patch("crewai.LLM") + async def test_responses_api_blocked(self, mock_llm, openai_cfg_responses, mock_builder): + with pytest.raises(ValueError, match="Responses API is not supported"): + async with openai_crewai(openai_cfg_responses, mock_builder): + pass + mock_llm.assert_not_called() + + +# --------------------------------------------------------------------------- +# Registration decorator sanity check +# --------------------------------------------------------------------------- + + +@patch("nat.cli.type_registry.GlobalTypeRegistry") +def test_decorator_registration(mock_global_registry): + """Verify that register_llm_client decorators registered the CrewAI wrappers.""" + registry = MagicMock() + mock_global_registry.get.return_value = registry + + # Pretend the decorators already executed. + registry._llm_client_map = { + (NIMModelConfig, LLMFrameworkEnum.CREWAI): nim_crewai, + (OpenAIModelConfig, LLMFrameworkEnum.CREWAI): openai_crewai, + } + + assert registry._llm_client_map[(NIMModelConfig, LLMFrameworkEnum.CREWAI)] is nim_crewai + assert registry._llm_client_map[(OpenAIModelConfig, LLMFrameworkEnum.CREWAI)] is openai_crewai diff --git a/packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py b/packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py index 98ae381ac..751b656f0 100644 --- a/packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py +++ b/packages/nvidia_nat_langchain/src/nat/plugins/langchain/llm.py @@ -12,13 +12,16 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. +# pylint: disable=unused-argument +import logging from collections.abc import Sequence from typing import TypeVar from nat.builder.builder import Builder from nat.builder.framework_enum import LLMFrameworkEnum from nat.cli.register_workflow import register_llm_client +from nat.data_models.llm import APITypeEnum from nat.data_models.llm import LLMBaseConfig from nat.data_models.retry_mixin import RetryMixin from nat.data_models.thinking_mixin import ThinkingMixin @@ -31,8 +34,11 @@ from nat.llm.utils.thinking import FunctionArgumentWrapper from nat.llm.utils.thinking import patch_with_thinking from nat.utils.exception_handlers.automatic_retries import patch_with_retry +from nat.utils.responses_api import validate_no_responses_api from nat.utils.type_utils import override +logger = logging.getLogger(__name__) + ModelType = TypeVar("ModelType") @@ -110,8 +116,10 @@ async def aws_bedrock_langchain(llm_config: AWSBedrockModelConfig, _builder: Bui from langchain_aws import ChatBedrockConverse + validate_no_responses_api(llm_config, LLMFrameworkEnum.LANGCHAIN) + client = ChatBedrockConverse(**llm_config.model_dump( - exclude={"type", "context_size", "thinking"}, + exclude={"type", "context_size", "thinking", "api_type"}, by_alias=True, exclude_none=True, )) @@ -124,7 +132,10 @@ async def azure_openai_langchain(llm_config: AzureOpenAIModelConfig, _builder: B from langchain_openai import AzureChatOpenAI - client = AzureChatOpenAI(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True, exclude_none=True)) + validate_no_responses_api(llm_config, LLMFrameworkEnum.LANGCHAIN) + + client = AzureChatOpenAI( + **llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) yield _patch_llm_based_on_config(client, llm_config) @@ -134,9 +145,13 @@ async def nim_langchain(llm_config: NIMModelConfig, _builder: Builder): from langchain_nvidia_ai_endpoints import ChatNVIDIA + validate_no_responses_api(llm_config, LLMFrameworkEnum.LANGCHAIN) + # prefer max_completion_tokens over max_tokens client = ChatNVIDIA( - **llm_config.model_dump(exclude={"type", "max_tokens", "thinking"}, by_alias=True, exclude_none=True), + **llm_config.model_dump(exclude={"type", "max_tokens", "thinking", "api_type"}, + by_alias=True, + exclude_none=True), max_completion_tokens=llm_config.max_tokens, ) @@ -148,13 +163,23 @@ async def openai_langchain(llm_config: OpenAIModelConfig, _builder: Builder): from langchain_openai import ChatOpenAI - # If stream_usage is specified, it will override the default value of True. - client = ChatOpenAI(stream_usage=True, - **llm_config.model_dump( - exclude={"type", "thinking"}, - by_alias=True, - exclude_none=True, - )) + if llm_config.api_type == APITypeEnum.RESPONSES: + client = ChatOpenAI(stream_usage=True, + use_responses_api=True, + use_previous_response_id=True, + **llm_config.model_dump( + exclude={"type", "thinking", "api_type"}, + by_alias=True, + exclude_none=True, + )) + else: + # If stream_usage is specified, it will override the default value of True. + client = ChatOpenAI(stream_usage=True, + **llm_config.model_dump( + exclude={"type", "thinking", "api_type"}, + by_alias=True, + exclude_none=True, + )) yield _patch_llm_based_on_config(client, llm_config) @@ -164,6 +189,9 @@ async def litellm_langchain(llm_config: LiteLlmModelConfig, _builder: Builder): from langchain_litellm import ChatLiteLLM - client = ChatLiteLLM(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True, exclude_none=True)) + validate_no_responses_api(llm_config, LLMFrameworkEnum.LANGCHAIN) + + client = ChatLiteLLM( + **llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) yield _patch_llm_based_on_config(client, llm_config) diff --git a/packages/nvidia_nat_langchain/tests/test_llm_langchain.py b/packages/nvidia_nat_langchain/tests/test_llm_langchain.py new file mode 100644 index 000000000..2b0a84ba2 --- /dev/null +++ b/packages/nvidia_nat_langchain/tests/test_llm_langchain.py @@ -0,0 +1,182 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=unused-argument, not-async-context-manager + +import logging +from unittest.mock import MagicMock +from unittest.mock import patch + +import pytest + +from nat.builder.builder import Builder +from nat.builder.framework_enum import LLMFrameworkEnum +from nat.data_models.llm import APITypeEnum +from nat.llm.aws_bedrock_llm import AWSBedrockModelConfig +from nat.llm.nim_llm import NIMModelConfig +from nat.llm.openai_llm import OpenAIModelConfig +from nat.plugins.langchain.llm import aws_bedrock_langchain +from nat.plugins.langchain.llm import nim_langchain +from nat.plugins.langchain.llm import openai_langchain + +# --------------------------------------------------------------------------- +# NIM → LangChain wrapper tests +# --------------------------------------------------------------------------- + + +class TestNimLangChain: + """Tests for the nim_langchain wrapper.""" + + @pytest.fixture + def mock_builder(self): + return MagicMock(spec=Builder) + + @pytest.fixture + def nim_cfg(self): + # Default API type is CHAT_COMPLETION + return NIMModelConfig(model_name="nemotron-3b-chat") + + @pytest.fixture + def nim_cfg_wrong_api(self): + # Purposely create a config that violates the API-type requirement + return NIMModelConfig(model_name="nemotron-3b-chat", api_type=APITypeEnum.RESPONSES) + + @patch("langchain_nvidia_ai_endpoints.ChatNVIDIA") + async def test_basic_creation(self, mock_chat, nim_cfg, mock_builder): + """Wrapper should yield a ChatNVIDIA client with the dumped kwargs.""" + async with nim_langchain(nim_cfg, mock_builder) as client: + mock_chat.assert_called_once() + kwargs = mock_chat.call_args.kwargs + print(kwargs) + assert kwargs["model"] == "nemotron-3b-chat" + assert client is mock_chat.return_value + + @patch("langchain_nvidia_ai_endpoints.ChatNVIDIA") + async def test_api_type_validation(self, mock_chat, nim_cfg_wrong_api, mock_builder): + """Non-chat-completion API types must raise a ValueError.""" + with pytest.raises(ValueError): + async with nim_langchain(nim_cfg_wrong_api, mock_builder): + pass + mock_chat.assert_not_called() + + +# --------------------------------------------------------------------------- +# OpenAI → LangChain wrapper tests +# --------------------------------------------------------------------------- + + +class TestOpenAILangChain: + """Tests for the openai_langchain wrapper.""" + + @pytest.fixture + def mock_builder(self): + return MagicMock(spec=Builder) + + @pytest.fixture + def oa_cfg(self): + return OpenAIModelConfig(model_name="gpt-4o-mini") + + @pytest.fixture + def oa_cfg_responses(self): + # Explicitly set RESPONSES API and stream=True to test the branch logic. + return OpenAIModelConfig( + model_name="gpt-4o-mini", + api_type=APITypeEnum.RESPONSES, + stream=True, + temperature=0.2, + ) + + @patch("langchain_openai.ChatOpenAI") + async def test_basic_creation(self, mock_chat, oa_cfg, mock_builder): + """Default kwargs (stream_usage=True) and config kwargs must reach ChatOpenAI.""" + async with openai_langchain(oa_cfg, mock_builder) as client: + mock_chat.assert_called_once() + kwargs = mock_chat.call_args.kwargs + assert kwargs["model"] == "gpt-4o-mini" + # default injected by wrapper: + assert kwargs["stream_usage"] is True + assert client is mock_chat.return_value + + @patch("langchain_openai.ChatOpenAI") + async def test_responses_branch(self, mock_chat, oa_cfg_responses, mock_builder): + """When APIType==RESPONSES, special flags are added and stream is forced False.""" + # Silence the warning that the wrapper logs when it toggles stream. + with patch.object(logging.getLogger("nat.plugins.langchain.llm"), "warning"): + async with openai_langchain(oa_cfg_responses, mock_builder): + pass + + kwargs = mock_chat.call_args.kwargs + assert kwargs["use_responses_api"] is True + assert kwargs["use_previous_response_id"] is True + # Other original kwargs remain unchanged + assert kwargs["temperature"] == 0.2 + assert kwargs["stream_usage"] is True + + +# --------------------------------------------------------------------------- +# AWS Bedrock → LangChain wrapper tests +# --------------------------------------------------------------------------- + + +class TestBedrockLangChain: + """Tests for the aws_bedrock_langchain wrapper.""" + + @pytest.fixture + def mock_builder(self): + return MagicMock(spec=Builder) + + @pytest.fixture + def bedrock_cfg(self): + return AWSBedrockModelConfig(model_name="ai21.j2-ultra") + + @pytest.fixture + def bedrock_cfg_wrong_api(self): + return AWSBedrockModelConfig(model_name="ai21.j2-ultra", api_type=APITypeEnum.RESPONSES) + + @patch("langchain_aws.ChatBedrockConverse") + async def test_basic_creation(self, mock_chat, bedrock_cfg, mock_builder): + async with aws_bedrock_langchain(bedrock_cfg, mock_builder) as client: + mock_chat.assert_called_once() + kwargs = mock_chat.call_args.kwargs + assert kwargs["model"] == "ai21.j2-ultra" + assert client is mock_chat.return_value + + @patch("langchain_aws.ChatBedrockConverse") + async def test_api_type_validation(self, mock_chat, bedrock_cfg_wrong_api, mock_builder): + with pytest.raises(ValueError): + async with aws_bedrock_langchain(bedrock_cfg_wrong_api, mock_builder): + pass + mock_chat.assert_not_called() + + +# --------------------------------------------------------------------------- +# Registration decorator sanity check +# --------------------------------------------------------------------------- + + +@patch("nat.cli.type_registry.GlobalTypeRegistry") +def test_decorator_registration(mock_global_registry): + """Ensure register_llm_client decorators registered the LangChain wrappers.""" + registry = MagicMock() + mock_global_registry.get.return_value = registry + + registry._llm_client_map = { + (NIMModelConfig, LLMFrameworkEnum.LANGCHAIN): nim_langchain, + (OpenAIModelConfig, LLMFrameworkEnum.LANGCHAIN): openai_langchain, + (AWSBedrockModelConfig, LLMFrameworkEnum.LANGCHAIN): aws_bedrock_langchain, + } + + assert registry._llm_client_map[(NIMModelConfig, LLMFrameworkEnum.LANGCHAIN)] is nim_langchain + assert registry._llm_client_map[(OpenAIModelConfig, LLMFrameworkEnum.LANGCHAIN)] is openai_langchain + assert registry._llm_client_map[(AWSBedrockModelConfig, LLMFrameworkEnum.LANGCHAIN)] is aws_bedrock_langchain diff --git a/packages/nvidia_nat_llama_index/pyproject.toml b/packages/nvidia_nat_llama_index/pyproject.toml index 203a607c8..767fd54b7 100644 --- a/packages/nvidia_nat_llama_index/pyproject.toml +++ b/packages/nvidia_nat_llama_index/pyproject.toml @@ -23,17 +23,17 @@ dependencies = [ "nvidia-nat~=1.4", # We ran into pydantic validation errors with newer versions of llama-index, not sure which version introduced the # error - "llama-index-core~=0.12.21", + "llama-index-core~=0.12.40", "llama-index-embeddings-azure-openai~=0.3.9", "llama-index-embeddings-nvidia~=0.3.1", "llama-index-embeddings-openai~=0.3.1", "llama-index-llms-azure-openai~=0.3.2", "llama-index-llms-bedrock~=0.3.8", "llama-index-llms-litellm~=0.5.1", - "llama-index-llms-nvidia~=0.3.1", - "llama-index-llms-openai~=0.3.42", + "llama-index-llms-nvidia~=0.3.4", + "llama-index-llms-openai>=0.4.2,<1.0.0", "llama-index-readers-file~=0.4.4", - "llama-index~=0.12.21", + "llama-index~=0.12.40", ] requires-python = ">=3.11,<3.14" description = "Subpackage for Llama-Index integration in NeMo Agent toolkit" diff --git a/packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py b/packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py index d024a1701..36a2c8d33 100644 --- a/packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py +++ b/packages/nvidia_nat_llama_index/src/nat/plugins/llama_index/llm.py @@ -19,6 +19,7 @@ from nat.builder.builder import Builder from nat.builder.framework_enum import LLMFrameworkEnum from nat.cli.register_workflow import register_llm_client +from nat.data_models.llm import APITypeEnum from nat.data_models.llm import LLMBaseConfig from nat.data_models.retry_mixin import RetryMixin from nat.data_models.thinking_mixin import ThinkingMixin @@ -31,6 +32,7 @@ from nat.llm.utils.thinking import FunctionArgumentWrapper from nat.llm.utils.thinking import patch_with_thinking from nat.utils.exception_handlers.automatic_retries import patch_with_retry +from nat.utils.responses_api import validate_no_responses_api from nat.utils.type_utils import override ModelType = TypeVar("ModelType") @@ -82,8 +84,10 @@ async def aws_bedrock_llama_index(llm_config: AWSBedrockModelConfig, _builder: B from llama_index.llms.bedrock import Bedrock + validate_no_responses_api(llm_config, LLMFrameworkEnum.LLAMA_INDEX) + # LlamaIndex uses context_size instead of max_tokens - llm = Bedrock(**llm_config.model_dump(exclude={"type", "top_p", "thinking"}, by_alias=True)) + llm = Bedrock(**llm_config.model_dump(exclude={"type", "top_p", "thinking", "api_type"}, by_alias=True)) yield _patch_llm_based_on_config(llm, llm_config) @@ -93,7 +97,9 @@ async def azure_openai_llama_index(llm_config: AzureOpenAIModelConfig, _builder: from llama_index.llms.azure_openai import AzureOpenAI - llm = AzureOpenAI(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True)) + validate_no_responses_api(llm_config, LLMFrameworkEnum.LLAMA_INDEX) + + llm = AzureOpenAI(**llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True)) yield _patch_llm_based_on_config(llm, llm_config) @@ -103,7 +109,9 @@ async def nim_llama_index(llm_config: NIMModelConfig, _builder: Builder): from llama_index.llms.nvidia import NVIDIA - llm = NVIDIA(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True, exclude_none=True)) + validate_no_responses_api(llm_config, LLMFrameworkEnum.LLAMA_INDEX) + + llm = NVIDIA(**llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) yield _patch_llm_based_on_config(llm, llm_config) @@ -112,8 +120,14 @@ async def nim_llama_index(llm_config: NIMModelConfig, _builder: Builder): async def openai_llama_index(llm_config: OpenAIModelConfig, _builder: Builder): from llama_index.llms.openai import OpenAI + from llama_index.llms.openai import OpenAIResponses - llm = OpenAI(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True, exclude_none=True)) + if llm_config.api_type == APITypeEnum.RESPONSES: + llm = OpenAIResponses( + **llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) + else: + llm = OpenAI( + **llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) yield _patch_llm_based_on_config(llm, llm_config) @@ -123,6 +137,8 @@ async def litellm_llama_index(llm_config: LiteLlmModelConfig, _builder: Builder) from llama_index.llms.litellm import LiteLLM - llm = LiteLLM(**llm_config.model_dump(exclude={"type", "thinking"}, by_alias=True, exclude_none=True)) + validate_no_responses_api(llm_config, LLMFrameworkEnum.LLAMA_INDEX) + + llm = LiteLLM(**llm_config.model_dump(exclude={"type", "thinking", "api_type"}, by_alias=True, exclude_none=True)) yield _patch_llm_based_on_config(llm, llm_config) diff --git a/packages/nvidia_nat_llama_index/tests/test_llm_llama_index.py b/packages/nvidia_nat_llama_index/tests/test_llm_llama_index.py new file mode 100644 index 000000000..c70a0d70a --- /dev/null +++ b/packages/nvidia_nat_llama_index/tests/test_llm_llama_index.py @@ -0,0 +1,165 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=unused-argument, not-async-context-manager + +from unittest.mock import MagicMock +from unittest.mock import patch + +import pytest + +from nat.builder.builder import Builder +from nat.builder.framework_enum import LLMFrameworkEnum +from nat.data_models.llm import APITypeEnum +from nat.llm.aws_bedrock_llm import AWSBedrockModelConfig +from nat.llm.nim_llm import NIMModelConfig +from nat.llm.openai_llm import OpenAIModelConfig +from nat.plugins.llama_index.llm import aws_bedrock_llama_index +from nat.plugins.llama_index.llm import nim_llama_index +from nat.plugins.llama_index.llm import openai_llama_index + +# --------------------------------------------------------------------------- +# NIM → Llama-Index wrapper tests +# --------------------------------------------------------------------------- + + +class TestNimLlamaIndex: + """Tests for nim_llama_index.""" + + @pytest.fixture + def mock_builder(self): + return MagicMock(spec=Builder) + + @pytest.fixture + def nim_cfg(self): + return NIMModelConfig(model_name="nemotron-3b") + + @pytest.fixture + def nim_cfg_bad_api(self): + return NIMModelConfig(model_name="nemotron-3b", api_type=APITypeEnum.RESPONSES) + + @patch("llama_index.llms.nvidia.NVIDIA") + async def test_basic_creation(self, mock_nv, nim_cfg, mock_builder): + """Wrapper should instantiate llama_index.llms.nvidia.NVIDIA.""" + async with nim_llama_index(nim_cfg, mock_builder) as llm: + mock_nv.assert_called_once() + kwargs = mock_nv.call_args.kwargs + assert kwargs["model"] == "nemotron-3b" + assert llm is mock_nv.return_value + + @patch("llama_index.llms.nvidia.NVIDIA") + async def test_api_type_validation(self, mock_nv, nim_cfg_bad_api, mock_builder): + """Non-chat API types must raise.""" + with pytest.raises(ValueError): + async with nim_llama_index(nim_cfg_bad_api, mock_builder): + pass + mock_nv.assert_not_called() + + +# --------------------------------------------------------------------------- +# OpenAI → Llama-Index wrapper tests +# --------------------------------------------------------------------------- + + +class TestOpenAILlamaIndex: + """Tests for openai_llama_index.""" + + @pytest.fixture + def mock_builder(self): + return MagicMock(spec=Builder) + + @pytest.fixture + def oa_cfg_chat(self): + return OpenAIModelConfig(model_name="gpt-4o", base_url=None) + + @pytest.fixture + def oa_cfg_responses(self): + return OpenAIModelConfig(model_name="gpt-4o", api_type=APITypeEnum.RESPONSES, temperature=0.1) + + @patch("llama_index.llms.openai.OpenAI") + async def test_chat_completion_branch(self, mock_openai, oa_cfg_chat, mock_builder): + """CHAT_COMPLETION should create an OpenAI client, omitting base_url when None.""" + async with openai_llama_index(oa_cfg_chat, mock_builder) as llm: + mock_openai.assert_called_once() + kwargs = mock_openai.call_args.kwargs + assert kwargs["model"] == "gpt-4o" + assert "base_url" not in kwargs + assert llm is mock_openai.return_value + + @patch("llama_index.llms.openai.OpenAIResponses") + async def test_responses_branch(self, mock_resp, oa_cfg_responses, mock_builder): + """RESPONSES API type should instantiate OpenAIResponses.""" + async with openai_llama_index(oa_cfg_responses, mock_builder) as llm: + mock_resp.assert_called_once() + kwargs = mock_resp.call_args.kwargs + assert kwargs["model"] == "gpt-4o" + assert kwargs["temperature"] == 0.1 + assert llm is mock_resp.return_value + + +# --------------------------------------------------------------------------- +# AWS Bedrock → Llama-Index wrapper tests +# --------------------------------------------------------------------------- + + +class TestBedrockLlamaIndex: + """Tests for aws_bedrock_llama_index.""" + + @pytest.fixture + def mock_builder(self): + return MagicMock(spec=Builder) + + @pytest.fixture + def br_cfg(self): + return AWSBedrockModelConfig(model_name="ai21.j2-ultra") + + @pytest.fixture + def br_cfg_bad_api(self): + return AWSBedrockModelConfig(model_name="ai21.j2-ultra", api_type=APITypeEnum.RESPONSES) + + @patch("llama_index.llms.bedrock.Bedrock") + async def test_basic_creation(self, mock_bedrock, br_cfg, mock_builder): + async with aws_bedrock_llama_index(br_cfg, mock_builder) as llm: + mock_bedrock.assert_called_once() + assert mock_bedrock.call_args.kwargs["model"] == "ai21.j2-ultra" + assert llm is mock_bedrock.return_value + + @patch("llama_index.llms.bedrock.Bedrock") + async def test_api_type_validation(self, mock_bedrock, br_cfg_bad_api, mock_builder): + with pytest.raises(ValueError): + async with aws_bedrock_llama_index(br_cfg_bad_api, mock_builder): + pass + mock_bedrock.assert_not_called() + + +# --------------------------------------------------------------------------- +# Registration decorator sanity check +# --------------------------------------------------------------------------- + + +@patch("nat.cli.type_registry.GlobalTypeRegistry") +def test_decorator_registration(mock_global_registry): + """Ensure register_llm_client decorators registered the Llama-Index wrappers.""" + registry = MagicMock() + mock_global_registry.get.return_value = registry + + registry._llm_client_map = { + (NIMModelConfig, LLMFrameworkEnum.LLAMA_INDEX): nim_llama_index, + (OpenAIModelConfig, LLMFrameworkEnum.LLAMA_INDEX): openai_llama_index, + (AWSBedrockModelConfig, LLMFrameworkEnum.LLAMA_INDEX): aws_bedrock_llama_index, + } + + assert registry._llm_client_map[(NIMModelConfig, LLMFrameworkEnum.LLAMA_INDEX)] is nim_llama_index + assert registry._llm_client_map[(OpenAIModelConfig, LLMFrameworkEnum.LLAMA_INDEX)] is openai_llama_index + assert registry._llm_client_map[(AWSBedrockModelConfig, LLMFrameworkEnum.LLAMA_INDEX)] is aws_bedrock_llama_index diff --git a/packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py b/packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py index 5825db4b7..a030546f8 100644 --- a/packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py +++ b/packages/nvidia_nat_semantic_kernel/src/nat/plugins/semantic_kernel/llm.py @@ -27,6 +27,7 @@ from nat.llm.utils.thinking import FunctionArgumentWrapper from nat.llm.utils.thinking import patch_with_thinking from nat.utils.exception_handlers.automatic_retries import patch_with_retry +from nat.utils.responses_api import validate_no_responses_api from nat.utils.type_utils import override ModelType = TypeVar("ModelType") @@ -89,6 +90,8 @@ async def azure_openai_semantic_kernel(llm_config: AzureOpenAIModelConfig, _buil from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion + validate_no_responses_api(llm_config, LLMFrameworkEnum.SEMANTIC_KERNEL) + llm = AzureChatCompletion( api_key=llm_config.api_key, api_version=llm_config.api_version, @@ -104,6 +107,8 @@ async def openai_semantic_kernel(llm_config: OpenAIModelConfig, _builder: Builde from semantic_kernel.connectors.ai.open_ai import OpenAIChatCompletion + validate_no_responses_api(llm_config, LLMFrameworkEnum.SEMANTIC_KERNEL) + llm = OpenAIChatCompletion(ai_model_id=llm_config.model_name) yield _patch_llm_based_on_config(llm, llm_config) diff --git a/packages/nvidia_nat_semantic_kernel/tests/test_llm_sk.py b/packages/nvidia_nat_semantic_kernel/tests/test_llm_sk.py new file mode 100644 index 000000000..c2ae8983a --- /dev/null +++ b/packages/nvidia_nat_semantic_kernel/tests/test_llm_sk.py @@ -0,0 +1,82 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=unused-argument, not-async-context-manager + +from unittest.mock import MagicMock +from unittest.mock import patch + +import pytest + +from nat.builder.builder import Builder +from nat.builder.framework_enum import LLMFrameworkEnum +from nat.data_models.llm import APITypeEnum +from nat.llm.openai_llm import OpenAIModelConfig +from nat.plugins.semantic_kernel.llm import openai_semantic_kernel + +# --------------------------------------------------------------------------- +# OpenAI → Semantic-Kernel wrapper tests +# --------------------------------------------------------------------------- + + +class TestOpenAISemanticKernel: + """Tests for the openai_semantic_kernel wrapper.""" + + @pytest.fixture + def mock_builder(self) -> Builder: + return MagicMock(spec=Builder) + + @pytest.fixture + def oa_cfg(self): + return OpenAIModelConfig(model_name="gpt-4o") + + @pytest.fixture + def oa_cfg_responses(self): + # Using the RESPONSES API must be rejected by the wrapper. + return OpenAIModelConfig(model_name="gpt-4o", api_type=APITypeEnum.RESPONSES) + + @patch("semantic_kernel.connectors.ai.open_ai.OpenAIChatCompletion") + async def test_basic_creation(self, mock_sk, oa_cfg, mock_builder): + """Ensure the wrapper instantiates OpenAIChatCompletion with the right model id.""" + async with openai_semantic_kernel(oa_cfg, mock_builder) as llm_obj: + mock_sk.assert_called_once() + assert mock_sk.call_args.kwargs["ai_model_id"] == "gpt-4o" + assert llm_obj is mock_sk.return_value + + @patch("semantic_kernel.connectors.ai.open_ai.OpenAIChatCompletion") + async def test_responses_api_blocked(self, mock_sk, oa_cfg_responses, mock_builder): + """Selecting APIType.RESPONSES must raise a ValueError.""" + with pytest.raises(ValueError, match="Responses API is not supported"): + async with openai_semantic_kernel(oa_cfg_responses, mock_builder): + pass + mock_sk.assert_not_called() + + +# --------------------------------------------------------------------------- +# Registration decorator sanity check +# --------------------------------------------------------------------------- + + +@patch("nat.cli.type_registry.GlobalTypeRegistry") +def test_decorator_registration(mock_global_registry): + """Verify that register_llm_client decorated the Semantic-Kernel wrapper.""" + registry = MagicMock() + mock_global_registry.get.return_value = registry + + # Pretend decorator execution populated the map. + registry._llm_client_map = { + (OpenAIModelConfig, LLMFrameworkEnum.SEMANTIC_KERNEL): openai_semantic_kernel, + } + + assert (registry._llm_client_map[(OpenAIModelConfig, LLMFrameworkEnum.SEMANTIC_KERNEL)] is openai_semantic_kernel) diff --git a/src/nat/agent/register.py b/src/nat/agent/register.py index 3c219d11e..c1204402f 100644 --- a/src/nat/agent/register.py +++ b/src/nat/agent/register.py @@ -19,5 +19,6 @@ from .prompt_optimizer import register as prompt_optimizer from .react_agent import register as react_agent from .reasoning_agent import reasoning_agent +from .responses_api_agent import register as responses_api_agent from .rewoo_agent import register as rewoo_agent from .tool_calling_agent import register as tool_calling_agent diff --git a/src/nat/agent/responses_api_agent/__init__.py b/src/nat/agent/responses_api_agent/__init__.py new file mode 100644 index 000000000..cf7c586a5 --- /dev/null +++ b/src/nat/agent/responses_api_agent/__init__.py @@ -0,0 +1,14 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/src/nat/agent/responses_api_agent/register.py b/src/nat/agent/responses_api_agent/register.py new file mode 100644 index 000000000..08a1f751c --- /dev/null +++ b/src/nat/agent/responses_api_agent/register.py @@ -0,0 +1,126 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import logging +import typing + +from pydantic import Field + +from nat.agent.base import AGENT_LOG_PREFIX +from nat.builder.builder import Builder +from nat.builder.framework_enum import LLMFrameworkEnum +from nat.builder.function_info import FunctionInfo +from nat.cli.register_workflow import register_function +from nat.data_models.component_ref import FunctionRef +from nat.data_models.component_ref import LLMRef +from nat.data_models.function import FunctionBaseConfig +from nat.data_models.openai_mcp import OpenAIMCPSchemaTool + +logger = logging.getLogger(__name__) + + +class ResponsesAPIAgentWorkflowConfig(FunctionBaseConfig, name="responses_api_agent"): + """ + Defines an NeMo Agent Toolkit function that uses a Responses API + Agent performs reasoning inbetween tool calls, and utilizes the + tool names and descriptions to select the optimal tool. + """ + + llm_name: LLMRef = Field(description="The LLM model to use with the agent.") + verbose: bool = Field(default=False, description="Set the verbosity of the agent's logging.") + nat_tools: list[FunctionRef] = Field(default_factory=list, description="The list of tools to provide to the agent.") + mcp_tools: list[OpenAIMCPSchemaTool] = Field( + default_factory=list, + description="List of MCP tools to use with the agent. If empty, no MCP tools will be used.") + builtin_tools: list[dict[str, typing.Any]] = Field( + default_factory=list, + description="List of built-in tools to use with the agent. If empty, no built-in tools will be used.") + + max_iterations: int = Field(default=15, description="Number of tool calls before stoping the agent.") + description: str = Field(default="Agent Workflow", description="The description of this functions use.") + parallel_tool_calls: bool = Field(default=False, + description="Specify whether to allow parallel tool calls in the agent.") + handle_tool_errors: bool = Field( + default=True, + description="Specify ability to handle tool calling errors. If False, tool errors will raise an exception.") + + +@register_function(config_type=ResponsesAPIAgentWorkflowConfig, framework_wrappers=[LLMFrameworkEnum.LANGCHAIN]) +async def responses_api_agent_workflow(config: ResponsesAPIAgentWorkflowConfig, builder: Builder): + from langchain_core.messages.human import HumanMessage + from langchain_core.runnables import Runnable + from langchain_openai import ChatOpenAI + + from nat.agent.tool_calling_agent.agent import ToolCallAgentGraph + from nat.agent.tool_calling_agent.agent import ToolCallAgentGraphState + + llm: ChatOpenAI = await builder.get_llm(config.llm_name, wrapper_type=LLMFrameworkEnum.LANGCHAIN) + assert llm.use_responses_api, "Responses API Agent requires an LLM that supports the Responses API." + + # Get tools + tools = [] + nat_tools = await builder.get_tools(tool_names=config.nat_tools, wrapper_type=LLMFrameworkEnum.LANGCHAIN) + tools.extend(nat_tools) + # MCP tools are optional, if provided they will be used by the agent + tools.extend([m.model_dump() for m in config.mcp_tools]) + # Built-in tools are optional, if provided they will be used by the agent + tools.extend(config.builtin_tools) + + # Bind tools to LLM + if tools: + llm: Runnable = llm.bind_tools(tools=tools, parallel_tool_calls=config.parallel_tool_calls, strict=True) + + if config.verbose: + logger.info("%s Using LLM: %s with tools: %s", AGENT_LOG_PREFIX, llm.model_name, tools) + + agent = ToolCallAgentGraph( + llm=llm, + tools=nat_tools, # MCP and built-in tools are already bound to the LLM and need not be handled by graph + detailed_logs=config.verbose, + handle_tool_errors=config.handle_tool_errors) + + graph = await agent.build_graph() + + async def _response_fn(input_message: str) -> str: + try: + # initialize the starting state with the user query + input_message = HumanMessage(content=input_message) + state = ToolCallAgentGraphState(messages=[input_message]) + + # run the Tool Calling Agent Graph + state = await graph.ainvoke(state, config={'recursion_limit': (config.max_iterations + 1) * 2}) + # setting recursion_limit: 4 allows 1 tool call + # - allows the Tool Calling Agent to perform 1 cycle / call 1 single tool, + # - but stops the agent when it tries to call a tool a second time + + # get and return the output from the state + state = ToolCallAgentGraphState(**state) + output_message = state.messages[-1] # pylint: disable=E1136 + content = output_message.content[-1]['text'] if output_message.content and isinstance( + output_message.content[-1], dict) and 'text' in output_message.content[-1] else str( + output_message.content) + return content + except Exception as ex: + logger.exception("%s Tool Calling Agent failed with exception: %s", AGENT_LOG_PREFIX, ex, exc_info=ex) + if config.verbose: + return str(ex) + return "I seem to be having a problem." + + try: + yield FunctionInfo.from_fn(_response_fn, description=config.description) + except GeneratorExit: + logger.exception("%s Workflow exited early!", AGENT_LOG_PREFIX, exc_info=True) + finally: + logger.debug("%s Cleaning up react_agent workflow.", AGENT_LOG_PREFIX) diff --git a/src/nat/data_models/intermediate_step.py b/src/nat/data_models/intermediate_step.py index 0ee2d25cc..bc89bb216 100644 --- a/src/nat/data_models/intermediate_step.py +++ b/src/nat/data_models/intermediate_step.py @@ -103,11 +103,19 @@ class ToolSchema(BaseModel): function: ToolDetails = Field(..., description="The function details.") +class ServerToolUseSchema(BaseModel): + name: str + arguments: str | dict[str, typing.Any] | typing.Any + output: typing.Any + + model_config = ConfigDict(extra="ignore") + + class TraceMetadata(BaseModel): chat_responses: typing.Any | None = None chat_inputs: typing.Any | None = None tool_inputs: typing.Any | None = None - tool_outputs: typing.Any | None = None + tool_outputs: list[ServerToolUseSchema] | typing.Any | None = None tool_info: typing.Any | None = None span_inputs: typing.Any | None = None span_outputs: typing.Any | None = None diff --git a/src/nat/data_models/llm.py b/src/nat/data_models/llm.py index df0cb0200..6c5467822 100644 --- a/src/nat/data_models/llm.py +++ b/src/nat/data_models/llm.py @@ -14,14 +14,28 @@ # limitations under the License. import typing +from enum import Enum + +from pydantic import Field from .common import BaseModelRegistryTag from .common import TypedBaseModel +class APITypeEnum(str, Enum): + CHAT_COMPLETION = "chat_completion" + RESPONSES = "responses" + + class LLMBaseConfig(TypedBaseModel, BaseModelRegistryTag): """Base configuration for LLM providers.""" - pass + + api_type: APITypeEnum = Field(default=APITypeEnum.CHAT_COMPLETION, + description="The type of API to use for the LLM provider.", + json_schema_extra={ + "enum": [e.value for e in APITypeEnum], + "examples": [e.value for e in APITypeEnum], + }) LLMBaseConfigT = typing.TypeVar("LLMBaseConfigT", bound=LLMBaseConfig) diff --git a/src/nat/data_models/openai_mcp.py b/src/nat/data_models/openai_mcp.py new file mode 100644 index 000000000..a8da5aab6 --- /dev/null +++ b/src/nat/data_models/openai_mcp.py @@ -0,0 +1,46 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from enum import Enum + +from pydantic import BaseModel +from pydantic import ConfigDict +from pydantic import Field + + +class MCPApprovalRequiredEnum(str, Enum): + """ + Enum to specify if approval is required for tool usage in the OpenAI MCP schema. + """ + NEVER = "never" + ALWAYS = "always" + AUTO = "auto" + + +class OpenAIMCPSchemaTool(BaseModel): + """ + Represents a tool in the OpenAI MCP schema. + """ + type: str = "mcp" + server_label: str = Field(description="Label for the server where the tool is hosted.") + server_url: str = Field(description="URL of the server hosting the tool.") + allowed_tools: list[str] | None = Field(default=None, + description="List of allowed tool names that can be used by the agent.") + require_approval: MCPApprovalRequiredEnum = Field(default=MCPApprovalRequiredEnum.NEVER, + description="Specifies if approval is required for tool usage.") + headers: dict[str, str] | None = Field(default=None, + description="Optional headers to include in requests to the tool server.") + + model_config = ConfigDict(use_enum_values=True) diff --git a/src/nat/eval/evaluate.py b/src/nat/eval/evaluate.py index 106739a4b..3685fc847 100644 --- a/src/nat/eval/evaluate.py +++ b/src/nat/eval/evaluate.py @@ -104,6 +104,8 @@ def _compute_usage_stats(self, item: EvalInputItem): usage_stats_per_llm[llm_name].prompt_tokens += step.token_usage.prompt_tokens usage_stats_per_llm[llm_name].completion_tokens += step.token_usage.completion_tokens usage_stats_per_llm[llm_name].total_tokens += step.token_usage.total_tokens + usage_stats_per_llm[llm_name].reasoning_tokens += step.token_usage.reasoning_tokens + usage_stats_per_llm[llm_name].cached_tokens += step.token_usage.cached_tokens total_tokens += step.token_usage.total_tokens # find min and max event timestamps diff --git a/src/nat/eval/usage_stats.py b/src/nat/eval/usage_stats.py index da9588cdc..ad46afc31 100644 --- a/src/nat/eval/usage_stats.py +++ b/src/nat/eval/usage_stats.py @@ -21,6 +21,8 @@ class UsageStatsLLM(BaseModel): prompt_tokens: int = 0 completion_tokens: int = 0 + cached_tokens: int = 0 + reasoning_tokens: int = 0 total_tokens: int = 0 diff --git a/src/nat/profiler/callbacks/langchain_callback_handler.py b/src/nat/profiler/callbacks/langchain_callback_handler.py index 4adb847b7..bb10f08b3 100644 --- a/src/nat/profiler/callbacks/langchain_callback_handler.py +++ b/src/nat/profiler/callbacks/langchain_callback_handler.py @@ -33,6 +33,7 @@ from nat.builder.framework_enum import LLMFrameworkEnum from nat.data_models.intermediate_step import IntermediateStepPayload from nat.data_models.intermediate_step import IntermediateStepType +from nat.data_models.intermediate_step import ServerToolUseSchema from nat.data_models.intermediate_step import StreamEventData from nat.data_models.intermediate_step import ToolSchema from nat.data_models.intermediate_step import TraceMetadata @@ -48,7 +49,14 @@ def _extract_tools_schema(invocation_params: dict) -> list: tools_schema = [] if invocation_params is not None: for tool in invocation_params.get("tools", []): - tools_schema.append(ToolSchema(**tool)) + try: + tools_schema.append(ToolSchema(**tool)) + except Exception: + logger.debug( + "Failed to parse tool schema from invocation params: %s. \n This " + "can occur when the LLM server has native tools and can be ignored if " + "using the responses API.", + tool) return tools_schema @@ -93,11 +101,15 @@ def _extract_token_base_model(self, usage_metadata: dict[str, Any]) -> TokenUsag completion_tokens = usage_metadata.get("output_tokens", 0) total_tokens = usage_metadata.get("total_tokens", 0) - return TokenUsageBaseModel( - prompt_tokens=prompt_tokens, - completion_tokens=completion_tokens, - total_tokens=total_tokens, - ) + cache_tokens = usage_metadata.get("input_token_details", {}).get("cache_read", 0) + + reasoning_tokens = usage_metadata.get("output_token_details", {}).get("reasoning", 0) + + return TokenUsageBaseModel(prompt_tokens=prompt_tokens, + completion_tokens=completion_tokens, + total_tokens=total_tokens, + cached_tokens=cache_tokens, + reasoning_tokens=reasoning_tokens) return TokenUsageBaseModel() async def on_llm_start(self, serialized: dict[str, Any], prompts: list[str], **kwargs: Any) -> None: @@ -213,6 +225,7 @@ async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None: except IndexError: generation = None + message = None if isinstance(generation, ChatGeneration): try: message = generation.message @@ -232,6 +245,17 @@ async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None: else: llm_text_output = "" + tool_outputs_list = [] + # Check if message.additional_kwargs as tool_outputs indicative of server side tool calling + if message and message.additional_kwargs and "tool_outputs" in message.additional_kwargs: + tools_outputs = message.additional_kwargs["tool_outputs"] + if isinstance(tools_outputs, list): + for tool in tools_outputs: + try: + tool_outputs_list.append(ServerToolUseSchema(**tool)) + except Exception: + pass + # update shared state behind lock with self._lock: usage_stat = IntermediateStepPayload( @@ -243,7 +267,8 @@ async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None: data=StreamEventData(input=self._run_id_to_llm_input.get(str(kwargs.get("run_id", "")), ""), output=llm_text_output), usage_info=UsageInfo(token_usage=self._extract_token_base_model(usage_metadata)), - metadata=TraceMetadata(chat_responses=[generation] if generation else [])) + metadata=TraceMetadata(chat_responses=[generation] if generation else [], + tool_outputs=tool_outputs_list if tool_outputs_list else [])) self.step_manager.push_intermediate_step(usage_stat) diff --git a/src/nat/profiler/callbacks/llama_index_callback_handler.py b/src/nat/profiler/callbacks/llama_index_callback_handler.py index 5e16284c6..5f5f36e08 100644 --- a/src/nat/profiler/callbacks/llama_index_callback_handler.py +++ b/src/nat/profiler/callbacks/llama_index_callback_handler.py @@ -30,6 +30,7 @@ from nat.builder.framework_enum import LLMFrameworkEnum from nat.data_models.intermediate_step import IntermediateStepPayload from nat.data_models.intermediate_step import IntermediateStepType +from nat.data_models.intermediate_step import ServerToolUseSchema from nat.data_models.intermediate_step import StreamEventData from nat.data_models.intermediate_step import TraceMetadata from nat.data_models.intermediate_step import UsageInfo @@ -64,6 +65,26 @@ def __init__(self) -> None: self._run_id_to_tool_input = {} self._run_id_to_timestamp = {} + @staticmethod + def _extract_token_usage(response: ChatResponse) -> TokenUsageBaseModel: + token_usage = TokenUsageBaseModel() + try: + if response and response.additional_kwargs and "usage" in response.additional_kwargs: + usage = response.additional_kwargs["usage"] if "usage" in response.additional_kwargs else {} + token_usage.prompt_tokens = usage.input_tokens if hasattr(usage, "input_tokens") else 0 + token_usage.completion_tokens = usage.output_tokens if hasattr(usage, "output_tokens") else 0 + + if hasattr(usage, "input_tokens_details") and hasattr(usage.input_tokens_details, "cached_tokens"): + token_usage.cached_tokens = usage.input_tokens_details.cached_tokens + + if hasattr(usage, "output_tokens_details") and hasattr(usage.output_tokens_details, "reasoning_tokens"): + token_usage.reasoning_tokens = usage.output_tokens_details.reasoning_tokens + + except Exception as e: + logger.debug("Error extracting token usage: %s", e, exc_info=True) + + return token_usage + def on_event_start( self, event_type: CBEventType, @@ -167,6 +188,18 @@ def on_event_end( except Exception as e: logger.exception("Error getting model name: %s", e) + # Append usage data to NAT usage stats + tool_outputs_list = [] + # Check if message.additional_kwargs as tool_outputs indicative of server side tool calling + if response and response.additional_kwargs and "built_in_tool_calls" in response.additional_kwargs: + tools_outputs = response.additional_kwargs["built_in_tool_calls"] + if isinstance(tools_outputs, list): + for tool in tools_outputs: + try: + tool_outputs_list.append(ServerToolUseSchema(**tool.model_dump())) + except Exception: + pass + # Append usage data to NAT usage stats with self._lock: stats = IntermediateStepPayload( @@ -176,8 +209,9 @@ def on_event_end( name=model_name, UUID=event_id, data=StreamEventData(input=self._run_id_to_llm_input.get(event_id), output=llm_text_output), - metadata=TraceMetadata(chat_responses=response.message if response.message else None), - usage_info=UsageInfo(token_usage=TokenUsageBaseModel(**response.additional_kwargs))) + metadata=TraceMetadata(chat_responses=response.message if response.message else None, + tool_outputs=tool_outputs_list if tool_outputs_list else []), + usage_info=UsageInfo(token_usage=self._extract_token_usage(response))) self.step_manager.push_intermediate_step(stats) elif event_type == CBEventType.FUNCTION_CALL and payload: diff --git a/src/nat/profiler/callbacks/token_usage_base_model.py b/src/nat/profiler/callbacks/token_usage_base_model.py index f1d1562bc..22980a0f2 100644 --- a/src/nat/profiler/callbacks/token_usage_base_model.py +++ b/src/nat/profiler/callbacks/token_usage_base_model.py @@ -24,4 +24,6 @@ class TokenUsageBaseModel(BaseModel): prompt_tokens: int = Field(default=0, description="Number of tokens in the prompt.") completion_tokens: int = Field(default=0, description="Number of tokens in the completion.") + cached_tokens: int = Field(default=0, description="Number of tokens read from cache.") + reasoning_tokens: int = Field(default=0, description="Number of tokens used for reasoning.") total_tokens: int = Field(default=0, description="Number of tokens total.") diff --git a/src/nat/utils/responses_api.py b/src/nat/utils/responses_api.py new file mode 100644 index 000000000..ab060ea0e --- /dev/null +++ b/src/nat/utils/responses_api.py @@ -0,0 +1,26 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=raising-format-tuple + +from nat.builder.framework_enum import LLMFrameworkEnum +from nat.data_models.llm import APITypeEnum + + +def validate_no_responses_api(llm_config, framework: LLMFrameworkEnum): + """Validate that the LLM config does not use the Responses API.""" + + if llm_config.api_type == APITypeEnum.RESPONSES: + raise ValueError(f"Responses API is not supported for config {str(type(llm_config))} in framework {framework}. " + f"Please use a different API type.") diff --git a/tests/nat/agent/test_responses_api_agent.py b/tests/nat/agent/test_responses_api_agent.py new file mode 100644 index 000000000..959b19d1a --- /dev/null +++ b/tests/nat/agent/test_responses_api_agent.py @@ -0,0 +1,132 @@ +# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. +# SPDX-License-Identifier: Apache-2.0 +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# pylint: disable=not-async-context-manager,unused-argument + +import pytest + +from nat.agent.responses_api_agent.register import ResponsesAPIAgentWorkflowConfig +from nat.agent.responses_api_agent.register import responses_api_agent_workflow +from nat.data_models.openai_mcp import OpenAIMCPSchemaTool + + +class _MockBuilder: + + def __init__(self, llm, tools): + self._llm = llm + self._tools = tools + + async def get_llm(self, llm_name, wrapper_type): + # match interface and avoid unused warnings + return self._llm + + async def get_tools(self, tool_names, wrapper_type): + # match interface and avoid unused warnings + return self._tools + + +def _augment_llm_for_responses(llm): + """Augment the mock LLM class with Responses API properties/methods.""" + + klass = type(llm) + setattr(klass, "use_responses_api", True) + setattr(klass, "model_name", "mock-openai") + + def bind_tools(self, tools, parallel_tool_calls=False, strict=True): # noqa: D401 + # Store on class to avoid Pydantic instance attribute restrictions + klass = type(self) + # Preserve previously bound tools and merge with new ones + existing_tools = getattr(klass, "bound_tools", []) + # Create a set to track tool identity (by id for objects, by value for dicts) + all_tools = list(existing_tools) + for tool in tools: + if tool not in all_tools: + all_tools.append(tool) + setattr(klass, "bound_tools", all_tools) + # Preserve True values for parallel_tool_calls and strict (once True, stays True) + existing_parallel = getattr(klass, "bound_parallel", False) + existing_strict = getattr(klass, "bound_strict", False) + setattr(klass, "bound_parallel", existing_parallel or parallel_tool_calls) + setattr(klass, "bound_strict", existing_strict or strict) + return self + + setattr(klass, "bind_tools", bind_tools) + return llm + + +def _augment_llm_without_responses(llm): + """Augment the mock LLM class but mark it as not Responses-capable.""" + klass = type(llm) + setattr(klass, "use_responses_api", False) + setattr(klass, "model_name", "mock-openai") + return llm + + +@pytest.fixture(name="nat_tool") +def nat_tool_fixture(mock_tool): + return mock_tool("Tool A") + + +async def _consume_function_info(gen): + """Helper to consume a single yield from the async generator and return FunctionInfo.""" + function_info = None + async for function_info in gen: + break + assert function_info is not None + return function_info + + +async def test_llm_requires_responses_api(mock_llm, nat_tool): + llm = _augment_llm_without_responses(mock_llm) + builder = _MockBuilder(llm=llm, tools=[nat_tool]) + config = ResponsesAPIAgentWorkflowConfig(llm_name="openai_llm", nat_tools=["tool_a"]) # type: ignore[list-item] + + with pytest.raises(AssertionError): + # The assertion occurs before yielding, when validating the LLM + async with responses_api_agent_workflow(config, builder): + pass + + +async def test_binds_tools_and_runs(mock_llm, nat_tool): + llm = _augment_llm_for_responses(mock_llm) + mcp = OpenAIMCPSchemaTool(server_label="deepwiki", server_url="https://mcp.deepwiki.com/mcp") + builtin = {"type": "code_interpreter", "container": {"type": "auto"}} + + builder = _MockBuilder(llm=llm, tools=[nat_tool]) + + config = ResponsesAPIAgentWorkflowConfig( + llm_name="openai_llm", + nat_tools=["tool_a"], # type: ignore[list-item] + builtin_tools=[builtin], + mcp_tools=[mcp], + verbose=True, + parallel_tool_calls=True, + ) + + async with responses_api_agent_workflow(config, builder) as function_info: + # Ensure tools were bound on the LLM (nat tool + mcp + builtin) + assert hasattr(type(llm), "bound_tools") + bound = type(llm).bound_tools + assert any(getattr(t, "name", None) == "Tool A" for t in bound) # NAT tool instance + assert builtin in bound # Built-in tool dict + assert mcp.model_dump() in bound # MCP tool dict + + # Parallel flag propagated + assert getattr(type(llm), "bound_parallel", False) is True + assert getattr(type(llm), "bound_strict", False) is True + + # Invoke the produced function and verify output path works end-to-end + result = await function_info.single_fn("please, mock tool call!") + assert isinstance(result, str) + assert result == "mock query" diff --git a/uv.lock b/uv.lock index a95412711..9818e5d77 100644 --- a/uv.lock +++ b/uv.lock @@ -3909,7 +3909,7 @@ wheels = [ [[package]] name = "llama-index" -version = "0.12.38" +version = "0.12.52" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-agent-openai" }, @@ -3925,37 +3925,37 @@ dependencies = [ { name = "llama-index-readers-llama-parse" }, { name = "nltk" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/81/27/4f25f8bb095941d84d8f59e30983142e342a9b369be06a885f32c6ff260e/llama_index-0.12.38.tar.gz", hash = "sha256:97a19b92aaae54f559d4252f609c74dce4463d1ee312e3bcbc6a7d70d98a1bf8", size = 8063, upload-time = "2025-05-29T04:24:22.832Z" } +sdist = { url = "https://files.pythonhosted.org/packages/a3/33/496f90bc77536c89b0b1266063977d99cf38e6a3458fd62a88c846f2c4f2/llama_index-0.12.52.tar.gz", hash = "sha256:3a81fa4fbf1a36e30502d2fb7da26d53bc1a1ab02db1db12e62f06bb014d5ad9", size = 8092, upload-time = "2025-07-23T18:11:59.26Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/a0/cb/b396f5dd6d4eb06b6af936b6fed41c3229872ffd3b4a9e182e3928bb38e2/llama_index-0.12.38-py3-none-any.whl", hash = "sha256:b8662c770298856f1c76712b0eb527724e11ee27c0d34120679d50844db9c392", size = 7083, upload-time = "2025-05-29T04:24:21.579Z" }, + { url = "https://files.pythonhosted.org/packages/e0/ca/b1bb3edca7140b8d9e8957c95c6c59f2596071e89d5a10b0814976de9450/llama_index-0.12.52-py3-none-any.whl", hash = "sha256:21e05e5a02b3601e18358eeed8748384eac8d35d384fdcbe16d03f0ffb09ea61", size = 7090, upload-time = "2025-07-23T18:11:57.548Z" }, ] [[package]] name = "llama-index-agent-openai" -version = "0.4.8" +version = "0.4.12" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-core" }, { name = "llama-index-llms-openai" }, { name = "openai" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/2c/10/34454bd6563ff7fb63dec264a34e2749486194f9b4fb1ea8c2e4b9f8e2e9/llama_index_agent_openai-0.4.8.tar.gz", hash = "sha256:ba76f21e1b7f0f66e326dc419c2cc403cbb614ae28f7904540b1103695965f68", size = 12230, upload-time = "2025-05-20T15:43:17.219Z" } +sdist = { url = "https://files.pythonhosted.org/packages/0e/94/69decc46d11e954c6a8c64999cc237af5932d116eeb7a06515856641a6d4/llama_index_agent_openai-0.4.12.tar.gz", hash = "sha256:d2fe53feb69cfe45752edb7328bf0d25f6a9071b3c056787e661b93e5b748a28", size = 12443, upload-time = "2025-06-29T00:52:03.606Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/ce/b8/1d7f50b6471fd73ff7309e6abd808c935a8d9d8547b192ce56ed3a05c142/llama_index_agent_openai-0.4.8-py3-none-any.whl", hash = "sha256:a03e8609ada0355b408d4173cd7663708f826f23328f9719fba00ea20b6851b6", size = 14212, upload-time = "2025-05-20T15:43:15.866Z" }, + { url = "https://files.pythonhosted.org/packages/89/f5/857ea1c136f422234e298e868af74094a71bf98687be40a365ad6551a660/llama_index_agent_openai-0.4.12-py3-none-any.whl", hash = "sha256:6dbb6276b2e5330032a726b28d5eef5140825f36d72d472b231f08ad3af99665", size = 14704, upload-time = "2025-06-29T00:52:02.528Z" }, ] [[package]] name = "llama-index-cli" -version = "0.4.1" +version = "0.4.4" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-core" }, { name = "llama-index-embeddings-openai" }, { name = "llama-index-llms-openai" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/4e/01/2155f7b830b84d09b98e6fd8094b333d39b0a0e4d2d28c9d2b0b6262757d/llama_index_cli-0.4.1.tar.gz", hash = "sha256:3f97f1f8f5f401dfb5b6bc7170717c176dcd981538017430073ef12ffdcbddfa", size = 25054, upload-time = "2025-02-27T21:13:56.189Z" } +sdist = { url = "https://files.pythonhosted.org/packages/d6/44/6acba0b8425d15682def89a4dbba68c782fd74ce6e74a4fa48beb08632f6/llama_index_cli-0.4.4.tar.gz", hash = "sha256:c3af0cf1e2a7e5ef44d0bae5aa8e8872b54c5dd6b731afbae9f13ffeb4997be0", size = 25308, upload-time = "2025-07-07T05:17:40.556Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/ae/fa/2ee58764d733e9b5d61036ba6c8c96adcdb567ea16a62c247519fbf34c13/llama_index_cli-0.4.1-py3-none-any.whl", hash = "sha256:6dfc931aea5b90c256e476b48dfac76f48fb2308fdf656bb02ee1e4f2cab8b06", size = 28493, upload-time = "2025-02-27T21:13:53.183Z" }, + { url = "https://files.pythonhosted.org/packages/cc/21/89989b7fa8ce4b9bc6f0f7326ae7f959a887c1281f007a718eafd6ef614f/llama_index_cli-0.4.4-py3-none-any.whl", hash = "sha256:1070593cf79407054735ab7a23c5a65a26fc18d264661e42ef38fc549b4b7658", size = 28598, upload-time = "2025-07-07T05:17:39.522Z" }, ] [[package]] @@ -4077,7 +4077,7 @@ wheels = [ [[package]] name = "llama-index-llms-azure-openai" -version = "0.3.2" +version = "0.3.4" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "azure-identity" }, @@ -4085,9 +4085,9 @@ dependencies = [ { name = "llama-index-core" }, { name = "llama-index-llms-openai" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/0c/cf/23c516c5a61c9b7a481c383862ebd99cc5e6a35f820dab871bb12b453b71/llama_index_llms_azure_openai-0.3.2.tar.gz", hash = "sha256:c6ae4e6d896abc784a1d60e02a537c91e019317de69d02256424eab80c988646", size = 6287, upload-time = "2025-03-06T19:31:06.818Z" } +sdist = { url = "https://files.pythonhosted.org/packages/53/a3/963da6260d74189fdf89a694883101bd7932a9ac22c433a0e475d528258f/llama_index_llms_azure_openai-0.3.4.tar.gz", hash = "sha256:ca6019c7a1721be19ccbdb993bf36fa0d24c0a0eaa3378601c5d6f8e47ebf1d9", size = 7057, upload-time = "2025-06-07T22:05:23.046Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/ce/2f/efab9bd63f7f3dd9ec83c945c9516beffcff14ffd4a8187150b46e69124b/llama_index_llms_azure_openai-0.3.2-py3-none-any.whl", hash = "sha256:1a831035129042327f50d243a17918c481dfae39fd5a7ddaaaa0a712fb18ab8e", size = 7283, upload-time = "2025-03-06T19:31:05.92Z" }, + { url = "https://files.pythonhosted.org/packages/3e/40/ea622bf89b014d28c29200b12478cbf3d6331c231e6a62cc7c9424dd12ae/llama_index_llms_azure_openai-0.3.4-py3-none-any.whl", hash = "sha256:296a9dfd2d7ee2af9151e18f0df9801db376d4a7bfd5d260eda5ddf36adcbac2", size = 7258, upload-time = "2025-06-07T22:05:20.772Z" }, ] [[package]] @@ -4119,84 +4119,84 @@ wheels = [ [[package]] name = "llama-index-llms-nvidia" -version = "0.3.3" +version = "0.3.6" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-core" }, { name = "llama-index-llms-openai" }, { name = "llama-index-llms-openai-like" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/d0/9d/f389096a8001116d9c2fed6be3ee3d662a19e5ce0933f8d25769a15c08f2/llama_index_llms_nvidia-0.3.3.tar.gz", hash = "sha256:a6b8d6651b592a894eac1bab0fbe6fc8c2e9419a5f881b093a7e3352da4e5f15", size = 10870, upload-time = "2025-03-06T19:23:12.591Z" } +sdist = { url = "https://files.pythonhosted.org/packages/af/89/50a9946adc326f6636471466450b182357acf541666b998e22223e940d42/llama_index_llms_nvidia-0.3.6.tar.gz", hash = "sha256:58991f2b6ae39d8c13e6626ddb3f0e28d8af4b8df63ad3592716f4800f7ef9df", size = 10481, upload-time = "2025-07-28T11:37:06.583Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/22/10/13213a9951d4eada2ef4c0a9c899ce559ebdc28e3bf9f9742f3e7d9a218d/llama_index_llms_nvidia-0.3.3-py3-none-any.whl", hash = "sha256:cd27e324b62e4103153d4765e27fd6e1ec63c659279f17a19a3418bc6a7693d6", size = 10277, upload-time = "2025-03-06T19:23:10.986Z" }, + { url = "https://files.pythonhosted.org/packages/d0/dc/f4998b5408de2ca631f9af086ac5a1fbd9106141d2423fdf1cc05b45904a/llama_index_llms_nvidia-0.3.6-py3-none-any.whl", hash = "sha256:bf734423e1c0dd1ef6fc6daa01caa5928c54a7b3f32f121c66f851557d9b3fe2", size = 10280, upload-time = "2025-07-28T11:37:05.534Z" }, ] [[package]] name = "llama-index-llms-openai" -version = "0.3.44" +version = "0.4.7" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-core" }, { name = "openai" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/95/35/4528119c4a42e772bbd5b20c076a36419f671311adb4018e7592554df351/llama_index_llms_openai-0.3.44.tar.gz", hash = "sha256:049506a584188b6c565d871aeb6e11a6c522967d4b4f5f1913eb46fd7bb3d731", size = 23282, upload-time = "2025-05-23T03:14:06.229Z" } +sdist = { url = "https://files.pythonhosted.org/packages/d9/39/a7ce514fb500951e9edb713ed918a9ffe49f1a76fccfc531a4ec5c7fe15a/llama_index_llms_openai-0.4.7.tar.gz", hash = "sha256:564af8ab39fb3f3adfeae73a59c0dca46c099ab844a28e725eee0c551d4869f8", size = 24251, upload-time = "2025-06-16T03:38:47.175Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/cc/da/c555b8b90014bdf771d6afd856e30c03fb58791c9df748bfbab2a933a3bf/llama_index_llms_openai-0.3.44-py3-none-any.whl", hash = "sha256:88358536b52d0a779fc20e3ade961e8900c048ff9877103816393f5f13670a37", size = 24477, upload-time = "2025-05-23T03:14:04.942Z" }, + { url = "https://files.pythonhosted.org/packages/61/e9/391926dad180ced6bb37a62edddb8483fbecde411239bd5e726841bb77b4/llama_index_llms_openai-0.4.7-py3-none-any.whl", hash = "sha256:3b8d9d3c1bcadc2cff09724de70f074f43eafd5b7048a91247c9a41b7cd6216d", size = 25365, upload-time = "2025-06-16T03:38:45.72Z" }, ] [[package]] name = "llama-index-llms-openai-like" -version = "0.3.5" +version = "0.4.0" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-core" }, { name = "llama-index-llms-openai" }, { name = "transformers" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/b0/89/651bb5ec35c594212abfaa827fb92d817632ec05440219597942ac754020/llama_index_llms_openai_like-0.3.5.tar.gz", hash = "sha256:d5321dff66d81d8b9008c76f44e95c499153643532acc859a29e7146272facd9", size = 4909, upload-time = "2025-05-19T22:03:59.689Z" } +sdist = { url = "https://files.pythonhosted.org/packages/df/df/807ac6bb9470295769f950562f5f7252cb491166693ee877ff77d9022fbc/llama_index_llms_openai_like-0.4.0.tar.gz", hash = "sha256:15ae1c16b01ba0bfa822d53900f03e35c19ffe47b528958234bf1942a91f587c", size = 4898, upload-time = "2025-05-30T17:47:11.689Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/ee/ff/a6d314871e0c90194073bf06ad8a9b837392491231e2af67a6a795a2ac08/llama_index_llms_openai_like-0.3.5-py3-none-any.whl", hash = "sha256:b977c7e345a648574918c5f699450f4ec652e44a71a4b27a10810f9036eb154b", size = 4598, upload-time = "2025-05-19T22:03:58.838Z" }, + { url = "https://files.pythonhosted.org/packages/b8/41/e080871437ec507377126165318f2da6713a4d6dc2767f2444a8bd818791/llama_index_llms_openai_like-0.4.0-py3-none-any.whl", hash = "sha256:52a3cb5ce78049fde5c9926898b90e02bc04e3d23adbc991842e9ff574df9ea1", size = 4593, upload-time = "2025-05-30T17:47:10.456Z" }, ] [[package]] name = "llama-index-multi-modal-llms-openai" -version = "0.4.3" +version = "0.5.3" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-core" }, { name = "llama-index-llms-openai" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/1a/9a/e3ab972880fc08d39475a0c7969b1a16ece58fe7f41ab8645f8342d57634/llama_index_multi_modal_llms_openai-0.4.3.tar.gz", hash = "sha256:5e6ca54069d3d18c2f5f7ca34f3720fba1d1b9126482ad38feb0c858f4feb63b", size = 5094, upload-time = "2025-01-31T20:28:04.607Z" } +sdist = { url = "https://files.pythonhosted.org/packages/6e/5d/8a7ff14f5ac6844722152ba35ee4e1298b4665a3cf70eaeae6d6df938e37/llama_index_multi_modal_llms_openai-0.5.3.tar.gz", hash = "sha256:b755a8b47d8d2f34b5a3d249af81d9bfb69d3d2cf9ab539d3a42f7bfa3e2391a", size = 3760, upload-time = "2025-07-07T16:22:44.922Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/75/90/7a5a44959192b739718618d6fbfb5be8d21909dbd81865b9d4bb45a8bc89/llama_index_multi_modal_llms_openai-0.4.3-py3-none-any.whl", hash = "sha256:1ceb42716472ac8bd5130afa29b793869d367946aedd02e48a3b03184e443ad1", size = 5870, upload-time = "2025-01-31T20:28:03.048Z" }, + { url = "https://files.pythonhosted.org/packages/18/e5/bc4ec1f373cd2195e625af483eddc5b05a55f6c6db020746f6bb2c0fadde/llama_index_multi_modal_llms_openai-0.5.3-py3-none-any.whl", hash = "sha256:be6237df8f9caaa257f9beda5317287bbd2ec19473d777a30a34e41a7c5bddf8", size = 3434, upload-time = "2025-07-07T16:22:43.898Z" }, ] [[package]] name = "llama-index-program-openai" -version = "0.3.1" +version = "0.3.2" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-agent-openai" }, { name = "llama-index-core" }, { name = "llama-index-llms-openai" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/7a/b8/24f1103106bfeed04f0e33b587863345c2d7fad001828bb02844a5427fbc/llama_index_program_openai-0.3.1.tar.gz", hash = "sha256:6039a6cdbff62c6388c07e82a157fe2edd3bbef0c5adf292ad8546bf4ec75b82", size = 4818, upload-time = "2024-11-25T18:39:39.812Z" } +sdist = { url = "https://files.pythonhosted.org/packages/83/81/9caa34e80adce1adb715ae083a54ad45c8fc0d9aef0f2d80d61c1b805ab6/llama_index_program_openai-0.3.2.tar.gz", hash = "sha256:04c959a2e616489894bd2eeebb99500d6f1c17d588c3da0ddc75ebd3eb7451ee", size = 6301, upload-time = "2025-05-30T23:00:27.872Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/00/59/3f31171c30a08c8ba21155d5241ba174630e57cf43b03d97fd77bf565b51/llama_index_program_openai-0.3.1-py3-none-any.whl", hash = "sha256:93646937395dc5318fd095153d2f91bd632b25215d013d14a87c088887d205f9", size = 5318, upload-time = "2024-11-25T18:39:38.396Z" }, + { url = "https://files.pythonhosted.org/packages/05/80/d6ac8afafdd38115d61214891c36876e64f429809abff873660fe30862fe/llama_index_program_openai-0.3.2-py3-none-any.whl", hash = "sha256:451829ae53e074e7b47dcc60a9dd155fcf9d1dcbc1754074bdadd6aab4ceb9aa", size = 6129, upload-time = "2025-05-30T23:00:26.64Z" }, ] [[package]] name = "llama-index-question-gen-openai" -version = "0.3.0" +version = "0.3.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "llama-index-core" }, { name = "llama-index-llms-openai" }, { name = "llama-index-program-openai" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/4e/47/c57392e2fb00c0f596f912e7977e3c639ac3314f2aed5d4ac733baa367f1/llama_index_question_gen_openai-0.3.0.tar.gz", hash = "sha256:efd3b468232808e9d3474670aaeab00e41b90f75f52d0c9bfbf11207e0963d62", size = 2608, upload-time = "2024-11-18T02:18:52.449Z" } +sdist = { url = "https://files.pythonhosted.org/packages/52/6e/19c5051c81ef5fca597d13c6d41b863535521565b1414ab5ab0e5e8c1297/llama_index_question_gen_openai-0.3.1.tar.gz", hash = "sha256:5e9311b433cc2581ff8a531fa19fb3aa21815baff75aaacdef11760ac9522aa9", size = 4107, upload-time = "2025-05-30T23:00:31.016Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/7c/2c/765b0dfc2c988bbea267e236c836d7a96c60a20df76d842e43e17401f800/llama_index_question_gen_openai-0.3.0-py3-none-any.whl", hash = "sha256:9b60ec114273a63b50349948666e5744a8f58acb645824e07c979041e8fec598", size = 2899, upload-time = "2024-11-18T02:18:50.945Z" }, + { url = "https://files.pythonhosted.org/packages/15/2a/652593d0bd24f901776db0d1778a42363ea2656530da18215f413ce4f981/llama_index_question_gen_openai-0.3.1-py3-none-any.whl", hash = "sha256:1ce266f6c8373fc8d884ff83a44dfbacecde2301785db7144872db51b8b99429", size = 3733, upload-time = "2025-05-30T23:00:29.965Z" }, ] [[package]] @@ -6099,16 +6099,16 @@ dependencies = [ [package.metadata] requires-dist = [ - { name = "llama-index", specifier = "~=0.12.21" }, - { name = "llama-index-core", specifier = "~=0.12.21" }, + { name = "llama-index", specifier = "~=0.12.40" }, + { name = "llama-index-core", specifier = "~=0.12.40" }, { name = "llama-index-embeddings-azure-openai", specifier = "~=0.3.9" }, { name = "llama-index-embeddings-nvidia", specifier = "~=0.3.1" }, { name = "llama-index-embeddings-openai", specifier = "~=0.3.1" }, { name = "llama-index-llms-azure-openai", specifier = "~=0.3.2" }, { name = "llama-index-llms-bedrock", specifier = "~=0.3.8" }, { name = "llama-index-llms-litellm", specifier = "~=0.5.1" }, - { name = "llama-index-llms-nvidia", specifier = "~=0.3.1" }, - { name = "llama-index-llms-openai", specifier = "~=0.3.42" }, + { name = "llama-index-llms-nvidia", specifier = "~=0.3.4" }, + { name = "llama-index-llms-openai", specifier = ">=0.4.2,<1.0.0" }, { name = "llama-index-readers-file", specifier = "~=0.4.4" }, { name = "nvidia-nat", editable = "." }, ]