Design Discussion: Interactive Auto-Debug System for AI Agents #16

wu-changxing started this conversation in Ideas

wu-changxing
Oct 4, 2025
Maintainer

Overview We want to build an interactive debugging system for ConnectOnion agents that combines the power of time-travel debugging, step-through execution, and live modification - all through the existing `@xray` decorator. ## The Core Idea Use `@xray` as natural breakpoints - no configuration needed. When debugging is enabled, tools marked with `@xray` become interactive pause points where you can inspect, modify, and control execution. ## Proposed User Experience ### Basic Usage `python from connectonion import Agent, auto_debug @xray def search_emails(query: str): return [...] def send_email(to: str, body: str): return "sent" agent = Agent(name="assistant", tools=[search_emails, send_email]) # Enable interactive debugging auto_debug(agent) agent.input("Send email to John")` ### What Happens INPUT: Send email to John → Tool: search_emails({"query": "John"}) ← Result (123ms): Found 1 email: [email protected] ┌─ @xray: search_emails ─────────────────────────┐ │ Arguments: │ │ query: "John" │ │ Result: │ │ Found 1 email: [email protected] │ │ Context: │ │ User prompt: "Send email to John" │ │ Iteration: 1 │ └─────────────────────────────────────────────────┘ 🔍 Debug Actions: [enter] Continue [i] Inspect state [m] Modify result [r] Retry with different args [p] Update prompt [q] Quit debugging > ## Two Debug Modes ### Mode 1: @xray Breakpoints (Default) Only stops at tools marked with `@xray`. Perfect for debugging specific tools while others run silently. `python auto_debug(agent) # Only pauses at @xray tools` ### Mode 2: Step Mode Stops after every tool execution, like GDB step-through. Perfect for deep debugging. `python auto_debug(agent, step=True) # Pauses at ALL tools` ## Interactive Commands At each breakpoint: - `[enter]` / `s` - Step to next tool - `c` - Continue to end without stopping - `i` - Inspect full agent state (messages, trace, tools) - `m` - Modify tool result before continuing - `r` - Retry tool with different arguments - `p` - Update system prompt mid-execution - `x` - Toggle step mode on/off - `q` - Quit debugging ## Why This Design Works ✅ Zero configuration - Just `auto_debug(agent)`, that's it ✅ Uses existing @xray - No new concepts to learn ✅ Developer controls breakpoints - Add `@xray` to tools you want to inspect ✅ Non-intrusive - Tools without `@xray` run normally ✅ Progressive disclosure - Simple continue, or dive deep with inspect/modify ✅ Time-travel capable - Modify results and see alternative outcomes ## Technical Foundation This is possible because agent execution is deterministic given: 1. User input 2. System prompt 3. Tool results 4. LLM responses We already record everything in `agent.current_session['trace']`, so we can: - Pause at any point - Inspect complete state - Modify data mid-execution - See alternative outcomes ## Inspiration from Great Debuggers This design borrows patterns from: - Smalltalk's live programming - Change code while running - Lisp's condition system - Pause at errors, fix, continue - GDB step mode - Walk through execution one step at a time - Bret Victor's demos - Immediate visual feedback - React DevTools - Visual timeline of state changes ## Implementation Phases ### Phase 1: Core Debug Infrastructure - Debugger class with pause/inspect/modify - Integration with `@xray` in `tool_executor.py` - Basic REPL menu at breakpoints ### Phase 2: Step Mode - Step-through all tools (not just `@xray`) - Continue mode (`c` command) - Toggle step mode on/off (`x` command) ### Phase 3: Advanced Modifications - Modify tool results - Retry with different arguments - Update system prompt mid-execution - Inspect full agent state ### Phase 4: Trace Replay (Future) - Save execution traces to file - Replay from saved traces - Time-travel (rewind/replay) ## Questions for Discussion 1. Is the `@xray` breakpoint concept intuitive? Or should we have explicit breakpoint configuration? 2. What other debugging commands would be valuable? (e.g., save checkpoint, restore checkpoint?) 3. Should we support batch testing mode? Run multiple test cases from JSONL files? 4. Non-interactive mode? Just show xray tables without pausing (for CI/logging)? 5. Visual debugger UI? Terminal-based TUI or web dashboard in the future? ## Related Work - Existing `@xray` decorator for visibility - Current trace recording in `agent.current_session` - Tool execution in `tool_executor.py` - Console output system --- This is part of the AI Auto-Coding milestone - building tools that help developers create and debug AI agents more effectively. What do you think? What features are most important? What's missing?

Replies: 0 comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment