[RFC] AG-UI Support in Agent Framework

## 1. Introduction

This RFC proposes implementing [AG-UI (Agent-User Interaction Protocol)](https://github.com/ag-ui-protocol/ag-ui) support in OpenSearch ml-commons. AG-UI is an open, lightweight, event-based protocol that standardizes how AI agents connect to user-facing applications. By implementing AG-UI compatibility, OpenSearch ml-commons will enable seamless real-time communication between conversational AI frontends and OpenSearch's agent framework, bringing agents directly into frontend applications with standardized streaming interactions and sophisticated tool execution capabilities.

**About AG-UI Protocol:**
AG-UI is positioned as a complementary protocol in the agentic ecosystem:
- **MCP (Model Context Protocol)**: Gives agents backend tools
- **A2A (Agent-to-Agent Protocol)**: Allows agents to communicate with other agents  
- **AG-UI**: Brings agents into user-facing applications and use frontend interaction tools

Our implementation enables OpenSearch ml-commons agents to work seamlessly with any AG-UI compatible frontend application.

## 2. Motivation

**Key Use Case: Website Chatbot Experiences**
A primary use case for AI agents is providing interactive chatbot experiences on websites and web applications. With AG-UI protocol support, OpenSearch ml-commons agents will be able to power these chatbot interactions directly, enabling developers to build sophisticated conversational interfaces that leverage OpenSearch's search capabilities, knowledge retrieval, and ML inference while maintaining real-time, streaming communication patterns expected in modern web applications.

Current OpenSearch ml-commons agent interactions are primarily designed for programmatic API access, lacking compatibility with the emerging AG-UI protocol standard for agent-user interactions. This creates several challenges:

**Protocol Compatibility Gaps:**
- No standardized event system following AG-UI specifications
- Tool execution restricted to backend-only operations, missing AG-UI's frontend tool capabilities
- Frontend developers cannot use existing AG-UI compatible applications with OpenSearch

**Developer Experience Barriers:**
- High barrier to entry for building AG-UI compatible applications with OpenSearch
- Manual conversion required between OpenSearch agent formats and AG-UI protocol requirements
- Complex integration work required for each frontend application

Implementing AG-UI protocol support addresses these limitations by providing:
- **Protocol Compliance**: Full compatibility with AG-UI's event-based protocol specification
- **Hybrid Tool Architecture**: Support for both backend tools and AG-UI frontend tool delegation
- **Ecosystem Integration**: Seamless compatibility with existing AG-UI frameworks and applications
- **Standardized Formats**: AG-UI compatible input/output formats with established patterns

## 3. Proposed Solution

We propose implementing AG-UI protocol compatibility in OpenSearch ml-commons through three core enhancements that align with AG-UI specifications:

### 3.1. AG-UI Protocol Input Support

Automatic detection and conversion of AG-UI protocol compliant requests to ml-commons agent format:

```json
// AG-UI Input Format
{
  "threadId": "thread_abc123",
  "runId": "run_def456",
  "messages": [
    {"role": "user", "content": "Search for documents"},
    {"role": "assistant", "toolCalls": [...]},
    {"role": "tool", "content": "...", "toolCallId": "call_123"}
  ],
  "tools": [...],
  "context": [...]
}
```

### 3.2. Frontend Integration via AG-UI Client SDK

Server-Sent Events (SSE) implementation following AG-UI's event types, accessible through the AG-UI client SDK:

```javascript
// Frontend Integration Example using HttpAgent
import { HttpAgent } from '@ag-ui/client';

const agent = new HttpAgent({
  url: 'https://your-opensearch-cluster/_plugins/_ml/agents/agent_123/_execute/stream'
});
```

### 3.3. AG-UI Tool Integration Model

Implementation of AG-UI's hybrid tool execution model, integrating with the ReAct (Reasoning and Acting) loop:

```mermaid
graph TD
    A[Initial AG-UI Request] --> B[ReAct Loop 1]
    B --> C{Tool Selection}
    C -->|Backend Tool| D[Execute in OpenSearch]
    C -->|Frontend Tool| E[End ReAct Loop 1]
    
    D --> F[Tool Results]
    F --> G[Continue ReAct Loop 1]
    G --> H[Next Iteration]
    H --> I[Final Answer]
    I --> J[Stream Events]
    
    E --> K[Return Tool Call]
    K --> L[Browser Executes Tool]
    L --> M[New Request with Results]
    M --> N[ReAct Loop 2]
    N --> O[Continue Reasoning]
    O --> P[Final Answer]
    P --> Q[Stream Events]
```

## 4. Technical Design

### 4.1. Core Components Architecture

```
Input Processing Layer
├── AGUIInputConverter - Format detection and conversion
├── AGUIConstants - Centralized constants and field definitions
└── Validation - Input structure and content validation

Agent Execution Layer  
├── MLAGUIAgentRunner - AGUI-specific agent processing
├── MLAgentExecutor - Routing and agent selection
└── Context Processing - Chat history and context extraction

Streaming Event System
├── BaseEvent - Abstract event foundation
├── Event Types - 13+ specific event implementations  
├── AGUIStreamingEventManager - Lifecycle and state management
└── REST Integration - SSE streaming endpoint handling

Tool Integration System
├── AGUIFrontendTool - Frontend tool representation
├── Tool Coordination - Backend/frontend tool routing
├── Function Calling - Multi-LLM interface support
└── Result Aggregation - Tool result consolidation
```

### 4.2. AG-UI Input Processing Implementation

**AGUIInputConverter** detects AG-UI format requests and converts them to ml-commons `AgentMLInput` format:
- **Format Detection**: Validates required fields (`threadId`, `runId`, `messages`, `tools`)
- **Parameter Mapping**: Maps AG-UI fields to internal parameters (`threadId` → `agui_thread_id`, etc.)
- **Tool Result Extraction**: Processes user messages with `toolCallId` as frontend tool results

### 4.3. Event System Implementation

**AG-UI Events** follow the protocol's standard event types:
- **Run Events**: `RUN_STARTED`, `RUN_FINISHED`, `RUN_ERROR`
- **Text Events**: `TEXT_MESSAGE_START`, `TEXT_MESSAGE_CONTENT`, `TEXT_MESSAGE_END`  
- **Tool Events**: `TOOL_CALL_START`, `TOOL_CALL_ARGS`, `TOOL_CALL_END`, `TOOL_CALL_RESULT`
- **State Events**: `MESSAGES_SNAPSHOT`

**AGUIStreamingEventManager** handles thread-safe state management with automatic cleanup after conversation completion.

### 4.4. Agent Execution Flow

**MLAgentExecutor Integration** (`ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAgentExecutor.java`)
```java
// Agent type routing based on input format
if (AGUIInputConverter.isAGUIInput(inputJson)) {
    return new MLAGUIAgentRunner(client, settings, clusterService, ...);
} else {
    return new MLChatAgentRunner(client, settings, clusterService, ...);
}
```

**MLAGUIAgentRunner Processing** (`ml-algorithms/src/main/java/org/opensearch/ml/engine/algorithms/agent/MLAGUIAgentRunner.java`)
```java
@Override
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
    // 1. Process AG-UI messages into chat history
    processAGUIMessages(mlAgent, params, llmInterface);
    
    // 2. Process AG-UI context into ml-commons format  
    processAGUIContext(mlAgent, params);
    
    // 3. Delegate to MLChatAgentRunner for actual execution
    MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
    conversationalRunner.run(mlAgent, params, listener, channel);
}
```

**Message Processing Logic:**
- **Tool Call Detection**: Identifies assistant messages with `toolCalls` field
- **Tool Result Processing**: Handles tool role messages with `toolCallId` 
- **Chat History Generation**: Filters out intermediate tool messages, creates clean history
- **Recent Tool Results**: Only processes most recent tool execution cycle
- **LLM Format Conversion**: Uses `FunctionCalling.formatAGUIToolCalls()` for provider-specific formatting

### 4.5. Frontend Tool Execution and ReAct Loop Integration

The AG-UI implementation integrates frontend tool execution with the existing ReAct (Reasoning and Acting) loop in `MLChatAgentRunner` through a sophisticated pause-resume mechanism:

**1. ReAct Loop Tool Selection Phase:**
```java
// Inside runReAct method - when LLM selects a tool to execute
if (tools.containsKey(action)) {
    // Determine if tool is backend or frontend
    boolean isBackendTool = backendTools != null && backendTools.containsKey(action);
    boolean isFrontendTool = !isBackendTool;
    
    if (isFrontendTool) {
        // PAUSE REACT LOOP: Create tool delegation response for frontend
        ModelTensorOutput frontendToolResponse = createFrontendToolCallResponse(toolCallId, action, actionInput);
        listener.onResponse(frontendToolResponse); // Return control to frontend
        return; // Exit ReAct loop - frontend will execute tool and send results back
    } else {
        // CONTINUE REACT LOOP: Execute backend tool normally
        runTool(tools, toolSpecMap, tmpParameters, nextStepListener, action, actionInput, toolParams, interactions, toolCallId, functionCalling);
    }
}
```

**2. Tool Result Processing and ReAct Resumption:**
```java
// processAGUIToolResults() - when frontend tool results return
private void processAGUIToolResults(..., String aguiToolCallResults) {
    // 1. Parse frontend tool execution results
    List<Map<String, String>> toolResults = gson.fromJson(aguiToolCallResults, listType);
    
    // 2. Convert to LLM message format using FunctionCalling
    List<LLMMessage> llmMessages = functionCalling.supply(formattedResults);
    
    // 3. Reconstruct conversation context
    List<String> interactions = new ArrayList<>();
    // Add original assistant message with tool_calls
    interactions.addAll(assistantMessages);
    // Add tool result messages  
    for (LLMMessage llmMessage : llmMessages) {
        interactions.add(llmMessage.getResponse());
    }
    
    // 4. RESUME REACT LOOP: Continue with tool results integrated
    processUnifiedTools(mlAgent, updatedParams, listener, memory, sessionId, functionCalling, frontendTools);
}
```

**Tool Visibility Strategy:**
```java
// Unified tool approach - both frontend and backend tools visible to LLM
Map<String, Tool> unifiedToolsMap = new HashMap<>(backendToolsMap);
unifiedToolsMap.putAll(wrapFrontendToolsAsToolObjects(frontendTools));

// LLM sees all tools but execution is differentiated at runtime
runReAct(llm, unifiedToolsMap, toolSpecMap, params, memory, sessionId, tenantId, listener, functionCalling, backendToolsMap);
```

This design enables true hybrid tool execution where the LLM can seamlessly reason about and coordinate both backend OpenSearch operations and frontend user interactions within the same conversational flow.

### 4.6. AG-UI Agent Type: Internal Architecture and Comparison

**How AG-UI Agent Type Works:**

The AG-UI agent type (`MLAGUIAgentRunner`) operates as a specialized preprocessing layer that transforms AG-UI protocol requests into ml-commons compatible format while preserving AG-UI semantics. Here's how it works internally:

```java
// AG-UI Agent Execution Flow
public void run(MLAgent mlAgent, Map<String, String> params, ActionListener<Object> listener, TransportChannel channel) {
    // 1. AG-UI Protocol Processing
    processAGUIMessages(mlAgent, params, llmInterface);    // Convert AG-UI messages to chat history
    processAGUIContext(mlAgent, params);                   // Extract and format contextual information
    
    // 2. Delegate to Standard Conversational Runner
    MLAgentRunner conversationalRunner = new MLChatAgentRunner(...);
    conversationalRunner.run(mlAgent, params, listener, channel);
    
    // 3. Streaming events are generated in RestMLExecuteStreamAction
}
```

**Key Processing Steps:**

1. **Message Array Processing**: Converts AG-UI message arrays (with roles: user, assistant, tool) into ml-commons chat history format
2. **Tool Call Extraction**: Identifies assistant messages with `toolCalls` and processes them for LLM-specific formatting  
3. **Tool Result Integration**: Handles `tool` role messages containing frontend tool execution results
4. **Context Transformation**: Converts AG-UI context arrays into ml-commons context parameters
5. **Chat History Generation**: Creates appropriate chat history while filtering out intermediate tool execution messages

**Comparison with Existing Conversational Agent:**

| Aspect | Conversational Agent (`MLChatAgentRunner`) | AG-UI Agent (`MLAGUIAgentRunner`) |
|--------|-------------------------------------------|-----------------------------------|
| **Input Format** | ml-commons native format with `question` parameter | AG-UI protocol format with message arrays, threadId, runId |
| **Message Handling** | Single question + optional chat history parameter | Message array processing with role-based conversation flow |
| **Tool Execution** | Backend tools only | Hybrid: backend tools + frontend tool delegation |
| **Tool Result Processing** | Direct tool execution results | Frontend tool results via AG-UI message format |
| **Streaming Output** | ml-commons response format | AG-UI event stream (RUN_STARTED, TEXT_MESSAGE_CONTENT, etc.) |
| **LLM Integration** | Direct LLM interface calls | AG-UI tool call formatting + standard LLM integration |


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] AG-UI Support in Agent Framework #4409

1. Introduction

2. Motivation

3. Proposed Solution

3.1. AG-UI Protocol Input Support

3.2. Frontend Integration via AG-UI Client SDK

3.3. AG-UI Tool Integration Model

4. Technical Design

4.1. Core Components Architecture

4.2. AG-UI Input Processing Implementation

4.3. Event System Implementation

4.4. Agent Execution Flow

4.5. Frontend Tool Execution and ReAct Loop Integration

4.6. AG-UI Agent Type: Internal Architecture and Comparison

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Aspect	Conversational Agent (`MLChatAgentRunner`)	AG-UI Agent (`MLAGUIAgentRunner`)
Input Format	ml-commons native format with `question` parameter	AG-UI protocol format with message arrays, threadId, runId
Message Handling	Single question + optional chat history parameter	Message array processing with role-based conversation flow
Tool Execution	Backend tools only	Hybrid: backend tools + frontend tool delegation
Tool Result Processing	Direct tool execution results	Frontend tool results via AG-UI message format
Streaming Output	ml-commons response format	AG-UI event stream (RUN_STARTED, TEXT_MESSAGE_CONTENT, etc.)
LLM Integration	Direct LLM interface calls	AG-UI tool call formatting + standard LLM integration

[RFC] AG-UI Support in Agent Framework #4409

Description

1. Introduction

2. Motivation

3. Proposed Solution

3.1. AG-UI Protocol Input Support

3.2. Frontend Integration via AG-UI Client SDK

3.3. AG-UI Tool Integration Model

4. Technical Design

4.1. Core Components Architecture

4.2. AG-UI Input Processing Implementation

4.3. Event System Implementation

4.4. Agent Execution Flow

4.5. Frontend Tool Execution and ReAct Loop Integration

4.6. AG-UI Agent Type: Internal Architecture and Comparison

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions