-
Notifications
You must be signed in to change notification settings - Fork 186
Description
1. Introduction
This RFC proposes implementing AG-UI (Agent-User Interaction Protocol) support in OpenSearch ml-commons. AG-UI is an open, lightweight, event-based protocol that standardizes how AI agents connect to user-facing applications. By implementing AG-UI compatibility, OpenSearch ml-commons will enable seamless real-time communication between conversational AI frontends and OpenSearch's agent framework, bringing agents directly into frontend applications with standardized streaming interactions and sophisticated tool execution capabilities.
About AG-UI Protocol:
AG-UI is positioned as a complementary protocol in the agentic ecosystem:
- MCP (Model Context Protocol): Gives agents backend tools
- A2A (Agent-to-Agent Protocol): Allows agents to communicate with other agents
- AG-UI: Brings agents into user-facing applications and use frontend interaction tools
Our implementation enables OpenSearch ml-commons agents to work seamlessly with any AG-UI compatible frontend application.
2. Motivation
Key Use Case: Website Chatbot Experiences
A primary use case for AI agents is providing interactive chatbot experiences on websites and web applications. With AG-UI protocol support, OpenSearch ml-commons agents will be able to power these chatbot interactions directly, enabling developers to build sophisticated conversational interfaces that leverage OpenSearch's search capabilities, knowledge retrieval, and ML inference while maintaining real-time, streaming communication patterns expected in modern web applications.
Current OpenSearch ml-commons agent interactions are primarily designed for programmatic API access, lacking compatibility with the emerging AG-UI protocol standard for agent-user interactions. This creates several challenges:
Protocol Compatibility Gaps:
- No standardized event system following AG-UI specifications
- Tool execution restricted to backend-only operations, missing AG-UI's frontend tool capabilities
- Frontend developers cannot use existing AG-UI compatible applications with OpenSearch
Developer Experience Barriers:
- High barrier to entry for building AG-UI compatible applications with OpenSearch
- Manual conversion required between OpenSearch agent formats and AG-UI protocol requirements
- Complex integration work required for each frontend application
Implementing AG-UI protocol support addresses these limitations by providing:
- Protocol Compliance: Full compatibility with AG-UI's event-based protocol specification
- Hybrid Tool Architecture: Support for both backend tools and AG-UI frontend tool delegation
- Ecosystem Integration: Seamless compatibility with existing AG-UI frameworks and applications
- Standardized Formats: AG-UI compatible input/output formats with established patterns
3. Proposed Solution
We propose implementing AG-UI protocol compatibility in OpenSearch ml-commons through three core enhancements that align with AG-UI specifications:
3.1. AG-UI Protocol Input Support
Automatic detection and conversion of AG-UI protocol compliant requests to ml-commons agent format:
// AG-UI Input Format
{
"threadId": "thread_abc123",
"runId": "run_def456",
"messages": [
{"role": "user", "content": "Search for documents"},
{"role": "assistant", "toolCalls": [...]},
{"role": "tool", "content": "...", "toolCallId": "call_123"}
],
"tools": [...],
"context": [...]
}3.2. Frontend Integration via AG-UI Client SDK
Server-Sent Events (SSE) implementation following AG-UI's event types, accessible through the AG-UI client SDK:
// Frontend Integration Example using HttpAgent
import { HttpAgent } from '@ag-ui/client';
const agent = new HttpAgent({
url: 'https://your-opensearch-cluster/_plugins/_ml/agents/agent_123/_execute/stream'
});3.3. AG-UI Tool Integration Model
Implementation of AG-UI's hybrid tool execution model, integrating with the ReAct (Reasoning and Acting) loop:
graph TD
A[Initial AG-UI Request] --> B[ReAct Loop 1]
B --> C{Tool Selection}
C -->|Backend Tool| D[Execute in OpenSearch]
C -->|Frontend Tool| E[End ReAct Loop 1]
D --> F[Tool Results]
F --> G[Continue ReAct Loop 1]
G --> H[Next Iteration]
H --> I[Final Answer]
I --> J[Stream Events]
E --> K[Return Tool Call]
K --> L[Browser Executes Tool]
L --> M[New Request with Results]
M --> N[ReAct Loop 2]
N --> O[Continue Reasoning]
O --> P[Final Answer]
P --> Q[Stream Events]
4. Technical Design
4.1. Core Components Architecture
Input Processing Layer
├── AGUIInputConverter - Format detection and conversion
├── AGUIConstants - Centralized constants and field definitions
└── Validation - Input structure and content validation
Agent Execution Layer
├── MLAGUIAgentRunner - AGUI-specific agent processing
├── MLAgentExecutor - Routing and agent selection
└── Context Processing - Chat history and context extraction
Streaming Event System
├── BaseEvent - Abstract event foundation
├── Event Types - 13+ specific event implementations
├── AGUIStreamingEventManager - Lifecycle and state management
└── REST Integration - SSE streaming endpoint handling
Tool Integration System
├── AGUIFrontendTool - Frontend tool representation
├── Tool Coordination - Backend/frontend tool routing
├── Function Calling - Multi-LLM interface support
└── Result Aggregation - Tool result consolidation
4.2. AG-UI Input Processing Implementation
AGUIInputConverter detects AG-UI format requests and converts them to ml-commons AgentMLInput format:
- Format Detection: Validates required fields (
threadId,runId,messages,tools) - Parameter Mapping: Maps AG-UI fields to internal parameters (
threadId→agui_thread_id, etc.) - Tool Result Extraction: Processes user messages with
toolCallIdas frontend tool results
4.3. Event System Implementation
AG-UI Events follow the protocol's standard event types:
- Run Events:
RUN_STARTED,RUN_FINISHED,RUN_ERROR - Text Events:
TEXT_MESSAGE_START,TEXT_MESSAGE_CONTENT,TEXT_MESSAGE_END - Tool Events:
TOOL_CALL_START,TOOL_CALL_ARGS,TOOL_CALL_END,TOOL_CALL_RESULT - State Events:
MESSAGES_SNAPSHOT
AGUIStreamingEventManager handles thread-safe state management with automatic cleanup after conversation completion.
4.4. Agent Execution Flow
MLAgentExecutor Integration (ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java)
// Agent type routing based on input format
if (AGUIInputConverter.isAGUIInput(inputJson)) {
return new MLAGUIAgentRunner(client, settings, clusterService, ...);
} else {
return new MLChatAgentRunner(client, settings, clusterService, ...);
}MLAGUIAgentRunner Processing (ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAGUIAgentRunner.java)
@Override
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
// 1. Process AG-UI messages into chat history
processAGUIMessages(mlAgent, params, llmInterface);
// 2. Process AG-UI context into ml-commons format
processAGUIContext(mlAgent, params);
// 3. Delegate to MLChatAgentRunner for actual execution
MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
conversationalRunner.run(mlAgent, params, listener, channel);
}Message Processing Logic:
- Tool Call Detection: Identifies assistant messages with
toolCallsfield - Tool Result Processing: Handles tool role messages with
toolCallId - Chat History Generation: Filters out intermediate tool messages, creates clean history
- Recent Tool Results: Only processes most recent tool execution cycle
- LLM Format Conversion: Uses
FunctionCalling.formatAGUIToolCalls()for provider-specific formatting
4.5. Frontend Tool Execution and ReAct Loop Integration
The AG-UI implementation integrates frontend tool execution with the existing ReAct (Reasoning and Acting) loop in MLChatAgentRunner through a sophisticated pause-resume mechanism:
1. ReAct Loop Tool Selection Phase:
// Inside runReAct method - when LLM selects a tool to execute
if (tools.containsKey(action)) {
// Determine if tool is backend or frontend
boolean isBackendTool = backendTools != null && backendTools.containsKey(action);
boolean isFrontendTool = !isBackendTool;
if (isFrontendTool) {
// PAUSE REACT LOOP: Create tool delegation response for frontend
ModelTensorOutput frontendToolResponse = createFrontendToolCallResponse(toolCallId, action, actionInput);
listener.onResponse(frontendToolResponse); // Return control to frontend
return; // Exit ReAct loop - frontend will execute tool and send results back
} else {
// CONTINUE REACT LOOP: Execute backend tool normally
runTool(tools, toolSpecMap, tmpParameters, nextStepListener, action, actionInput, toolParams, interactions, toolCallId, functionCalling);
}
}2. Tool Result Processing and ReAct Resumption:
// processAGUIToolResults() - when frontend tool results return
private void processAGUIToolResults(..., String aguiToolCallResults) {
// 1. Parse frontend tool execution results
List<Map<String, String>> toolResults = gson.fromJson(aguiToolCallResults, listType);
// 2. Convert to LLM message format using FunctionCalling
List<LLMMessage> llmMessages = functionCalling.supply(formattedResults);
// 3. Reconstruct conversation context
List<String> interactions = new ArrayList<>();
// Add original assistant message with tool_calls
interactions.addAll(assistantMessages);
// Add tool result messages
for (LLMMessage llmMessage : llmMessages) {
interactions.add(llmMessage.getResponse());
}
// 4. RESUME REACT LOOP: Continue with tool results integrated
processUnifiedTools(mlAgent, updatedParams, listener, memory, sessionId, functionCalling, frontendTools);
}Tool Visibility Strategy:
// Unified tool approach - both frontend and backend tools visible to LLM
Map<String, Tool> unifiedToolsMap = new HashMap<>(backendToolsMap);
unifiedToolsMap.putAll(wrapFrontendToolsAsToolObjects(frontendTools));
// LLM sees all tools but execution is differentiated at runtime
runReAct(llm, unifiedToolsMap, toolSpecMap, params, memory, sessionId, tenantId, listener, functionCalling, backendToolsMap);This design enables true hybrid tool execution where the LLM can seamlessly reason about and coordinate both backend OpenSearch operations and frontend user interactions within the same conversational flow.
4.6. AG-UI Agent Type: Internal Architecture and Comparison
How AG-UI Agent Type Works:
The AG-UI agent type (MLAGUIAgentRunner) operates as a specialized preprocessing layer that transforms AG-UI protocol requests into ml-commons compatible format while preserving AG-UI semantics. Here's how it works internally:
// AG-UI Agent Execution Flow
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
// 1. AG-UI Protocol Processing
processAGUIMessages(mlAgent, params, llmInterface); // Convert AG-UI messages to chat history
processAGUIContext(mlAgent, params); // Extract and format contextual information
// 2. Delegate to Standard Conversational Runner
MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
conversationalRunner.run(mlAgent, params, listener, channel);
// 3. Streaming events are generated in RestMLExecuteStreamAction
}Key Processing Steps:
- Message Array Processing: Converts AG-UI message arrays (with roles: user, assistant, tool) into ml-commons chat history format
- Tool Call Extraction: Identifies assistant messages with
toolCallsand processes them for LLM-specific formatting - Tool Result Integration: Handles
toolrole messages containing frontend tool execution results - Context Transformation: Converts AG-UI context arrays into ml-commons context parameters
- Chat History Generation: Creates appropriate chat history while filtering out intermediate tool execution messages
Comparison with Existing Conversational Agent:
| Aspect | Conversational Agent (MLChatAgentRunner) |
AG-UI Agent (MLAGUIAgentRunner) |
|---|---|---|
| Input Format | ml-commons native format with question parameter |
AG-UI protocol format with message arrays, threadId, runId |
| Message Handling | Single question + optional chat history parameter | Message array processing with role-based conversation flow |
| Tool Execution | Backend tools only | Hybrid: backend tools + frontend tool delegation |
| Tool Result Processing | Direct tool execution results | Frontend tool results via AG-UI message format |
| Streaming Output | ml-commons response format | AG-UI event stream (RUN_STARTED, TEXT_MESSAGE_CONTENT, etc.) |
| LLM Integration | Direct LLM interface calls | AG-UI tool call formatting + standard LLM integration |