Skip to content

[RFC] AG-UI Support in Agent Framework #4409

@jiapingzeng

Description

@jiapingzeng

1. Introduction

This RFC proposes implementing AG-UI (Agent-User Interaction Protocol) support in OpenSearch ml-commons. AG-UI is an open, lightweight, event-based protocol that standardizes how AI agents connect to user-facing applications. By implementing AG-UI compatibility, OpenSearch ml-commons will enable seamless real-time communication between conversational AI frontends and OpenSearch's agent framework, bringing agents directly into frontend applications with standardized streaming interactions and sophisticated tool execution capabilities.

About AG-UI Protocol:
AG-UI is positioned as a complementary protocol in the agentic ecosystem:

  • MCP (Model Context Protocol): Gives agents backend tools
  • A2A (Agent-to-Agent Protocol): Allows agents to communicate with other agents
  • AG-UI: Brings agents into user-facing applications and use frontend interaction tools

Our implementation enables OpenSearch ml-commons agents to work seamlessly with any AG-UI compatible frontend application.

2. Motivation

Key Use Case: Website Chatbot Experiences
A primary use case for AI agents is providing interactive chatbot experiences on websites and web applications. With AG-UI protocol support, OpenSearch ml-commons agents will be able to power these chatbot interactions directly, enabling developers to build sophisticated conversational interfaces that leverage OpenSearch's search capabilities, knowledge retrieval, and ML inference while maintaining real-time, streaming communication patterns expected in modern web applications.

Current OpenSearch ml-commons agent interactions are primarily designed for programmatic API access, lacking compatibility with the emerging AG-UI protocol standard for agent-user interactions. This creates several challenges:

Protocol Compatibility Gaps:

  • No standardized event system following AG-UI specifications
  • Tool execution restricted to backend-only operations, missing AG-UI's frontend tool capabilities
  • Frontend developers cannot use existing AG-UI compatible applications with OpenSearch

Developer Experience Barriers:

  • High barrier to entry for building AG-UI compatible applications with OpenSearch
  • Manual conversion required between OpenSearch agent formats and AG-UI protocol requirements
  • Complex integration work required for each frontend application

Implementing AG-UI protocol support addresses these limitations by providing:

  • Protocol Compliance: Full compatibility with AG-UI's event-based protocol specification
  • Hybrid Tool Architecture: Support for both backend tools and AG-UI frontend tool delegation
  • Ecosystem Integration: Seamless compatibility with existing AG-UI frameworks and applications
  • Standardized Formats: AG-UI compatible input/output formats with established patterns

3. Proposed Solution

We propose implementing AG-UI protocol compatibility in OpenSearch ml-commons through three core enhancements that align with AG-UI specifications:

3.1. AG-UI Protocol Input Support

Automatic detection and conversion of AG-UI protocol compliant requests to ml-commons agent format:

// AG-UI Input Format
{
  "threadId": "thread_abc123",
  "runId": "run_def456",
  "messages": [
    {"role": "user", "content": "Search for documents"},
    {"role": "assistant", "toolCalls": [...]},
    {"role": "tool", "content": "...", "toolCallId": "call_123"}
  ],
  "tools": [...],
  "context": [...]
}

3.2. Frontend Integration via AG-UI Client SDK

Server-Sent Events (SSE) implementation following AG-UI's event types, accessible through the AG-UI client SDK:

// Frontend Integration Example using HttpAgent
import { HttpAgent } from '@ag-ui/client';

const agent = new HttpAgent({
  url: 'https://your-opensearch-cluster/_plugins/_ml/agents/agent_123/_execute/stream'
});

3.3. AG-UI Tool Integration Model

Implementation of AG-UI's hybrid tool execution model, integrating with the ReAct (Reasoning and Acting) loop:

graph TD
    A[Initial AG-UI Request] --> B[ReAct Loop 1]
    B --> C{Tool Selection}
    C -->|Backend Tool| D[Execute in OpenSearch]
    C -->|Frontend Tool| E[End ReAct Loop 1]
    
    D --> F[Tool Results]
    F --> G[Continue ReAct Loop 1]
    G --> H[Next Iteration]
    H --> I[Final Answer]
    I --> J[Stream Events]
    
    E --> K[Return Tool Call]
    K --> L[Browser Executes Tool]
    L --> M[New Request with Results]
    M --> N[ReAct Loop 2]
    N --> O[Continue Reasoning]
    O --> P[Final Answer]
    P --> Q[Stream Events]
Loading

4. Technical Design

4.1. Core Components Architecture

Input Processing Layer
├── AGUIInputConverter - Format detection and conversion
├── AGUIConstants - Centralized constants and field definitions
└── Validation - Input structure and content validation

Agent Execution Layer  
├── MLAGUIAgentRunner - AGUI-specific agent processing
├── MLAgentExecutor - Routing and agent selection
└── Context Processing - Chat history and context extraction

Streaming Event System
├── BaseEvent - Abstract event foundation
├── Event Types - 13+ specific event implementations  
├── AGUIStreamingEventManager - Lifecycle and state management
└── REST Integration - SSE streaming endpoint handling

Tool Integration System
├── AGUIFrontendTool - Frontend tool representation
├── Tool Coordination - Backend/frontend tool routing
├── Function Calling - Multi-LLM interface support
└── Result Aggregation - Tool result consolidation

4.2. AG-UI Input Processing Implementation

AGUIInputConverter detects AG-UI format requests and converts them to ml-commons AgentMLInput format:

  • Format Detection: Validates required fields (threadId, runId, messages, tools)
  • Parameter Mapping: Maps AG-UI fields to internal parameters (threadIdagui_thread_id, etc.)
  • Tool Result Extraction: Processes user messages with toolCallId as frontend tool results

4.3. Event System Implementation

AG-UI Events follow the protocol's standard event types:

  • Run Events: RUN_STARTED, RUN_FINISHED, RUN_ERROR
  • Text Events: TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT, TEXT_MESSAGE_END
  • Tool Events: TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END, TOOL_CALL_RESULT
  • State Events: MESSAGES_SNAPSHOT

AGUIStreamingEventManager handles thread-safe state management with automatic cleanup after conversation completion.

4.4. Agent Execution Flow

MLAgentExecutor Integration (ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java)

// Agent type routing based on input format
if (AGUIInputConverter.isAGUIInput(inputJson)) {
    return new MLAGUIAgentRunner(client, settings, clusterService, ...);
} else {
    return new MLChatAgentRunner(client, settings, clusterService, ...);
}

MLAGUIAgentRunner Processing (ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAGUIAgentRunner.java)

@Override
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
    // 1. Process AG-UI messages into chat history
    processAGUIMessages(mlAgent, params, llmInterface);
    
    // 2. Process AG-UI context into ml-commons format  
    processAGUIContext(mlAgent, params);
    
    // 3. Delegate to MLChatAgentRunner for actual execution
    MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
    conversationalRunner.run(mlAgent, params, listener, channel);
}

Message Processing Logic:

  • Tool Call Detection: Identifies assistant messages with toolCalls field
  • Tool Result Processing: Handles tool role messages with toolCallId
  • Chat History Generation: Filters out intermediate tool messages, creates clean history
  • Recent Tool Results: Only processes most recent tool execution cycle
  • LLM Format Conversion: Uses FunctionCalling.formatAGUIToolCalls() for provider-specific formatting

4.5. Frontend Tool Execution and ReAct Loop Integration

The AG-UI implementation integrates frontend tool execution with the existing ReAct (Reasoning and Acting) loop in MLChatAgentRunner through a sophisticated pause-resume mechanism:

1. ReAct Loop Tool Selection Phase:

// Inside runReAct method - when LLM selects a tool to execute
if (tools.containsKey(action)) {
    // Determine if tool is backend or frontend
    boolean isBackendTool = backendTools != null && backendTools.containsKey(action);
    boolean isFrontendTool = !isBackendTool;
    
    if (isFrontendTool) {
        // PAUSE REACT LOOP: Create tool delegation response for frontend
        ModelTensorOutput frontendToolResponse = createFrontendToolCallResponse(toolCallId, action, actionInput);
        listener.onResponse(frontendToolResponse); // Return control to frontend
        return; // Exit ReAct loop - frontend will execute tool and send results back
    } else {
        // CONTINUE REACT LOOP: Execute backend tool normally
        runTool(tools, toolSpecMap, tmpParameters, nextStepListener, action, actionInput, toolParams, interactions, toolCallId, functionCalling);
    }
}

2. Tool Result Processing and ReAct Resumption:

// processAGUIToolResults() - when frontend tool results return
private void processAGUIToolResults(..., String aguiToolCallResults) {
    // 1. Parse frontend tool execution results
    List<Map<String, String>> toolResults = gson.fromJson(aguiToolCallResults, listType);
    
    // 2. Convert to LLM message format using FunctionCalling
    List<LLMMessage> llmMessages = functionCalling.supply(formattedResults);
    
    // 3. Reconstruct conversation context
    List<String> interactions = new ArrayList<>();
    // Add original assistant message with tool_calls
    interactions.addAll(assistantMessages);
    // Add tool result messages  
    for (LLMMessage llmMessage : llmMessages) {
        interactions.add(llmMessage.getResponse());
    }
    
    // 4. RESUME REACT LOOP: Continue with tool results integrated
    processUnifiedTools(mlAgent, updatedParams, listener, memory, sessionId, functionCalling, frontendTools);
}

Tool Visibility Strategy:

// Unified tool approach - both frontend and backend tools visible to LLM
Map<String, Tool> unifiedToolsMap = new HashMap<>(backendToolsMap);
unifiedToolsMap.putAll(wrapFrontendToolsAsToolObjects(frontendTools));

// LLM sees all tools but execution is differentiated at runtime
runReAct(llm, unifiedToolsMap, toolSpecMap, params, memory, sessionId, tenantId, listener, functionCalling, backendToolsMap);

This design enables true hybrid tool execution where the LLM can seamlessly reason about and coordinate both backend OpenSearch operations and frontend user interactions within the same conversational flow.

4.6. AG-UI Agent Type: Internal Architecture and Comparison

How AG-UI Agent Type Works:

The AG-UI agent type (MLAGUIAgentRunner) operates as a specialized preprocessing layer that transforms AG-UI protocol requests into ml-commons compatible format while preserving AG-UI semantics. Here's how it works internally:

// AG-UI Agent Execution Flow
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
    // 1. AG-UI Protocol Processing
    processAGUIMessages(mlAgent, params, llmInterface);    // Convert AG-UI messages to chat history
    processAGUIContext(mlAgent, params);                   // Extract and format contextual information
    
    // 2. Delegate to Standard Conversational Runner
    MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
    conversationalRunner.run(mlAgent, params, listener, channel);
    
    // 3. Streaming events are generated in RestMLExecuteStreamAction
}

Key Processing Steps:

  1. Message Array Processing: Converts AG-UI message arrays (with roles: user, assistant, tool) into ml-commons chat history format
  2. Tool Call Extraction: Identifies assistant messages with toolCalls and processes them for LLM-specific formatting
  3. Tool Result Integration: Handles tool role messages containing frontend tool execution results
  4. Context Transformation: Converts AG-UI context arrays into ml-commons context parameters
  5. Chat History Generation: Creates appropriate chat history while filtering out intermediate tool execution messages

Comparison with Existing Conversational Agent:

Aspect Conversational Agent (MLChatAgentRunner) AG-UI Agent (MLAGUIAgentRunner)
Input Format ml-commons native format with question parameter AG-UI protocol format with message arrays, threadId, runId
Message Handling Single question + optional chat history parameter Message array processing with role-based conversation flow
Tool Execution Backend tools only Hybrid: backend tools + frontend tool delegation
Tool Result Processing Direct tool execution results Frontend tool results via AG-UI message format
Streaming Output ml-commons response format AG-UI event stream (RUN_STARTED, TEXT_MESSAGE_CONTENT, etc.)
LLM Integration Direct LLM interface calls AG-UI tool call formatting + standard LLM integration

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions