This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This project uses uv as the package manager (faster than pip) and Make for common tasks.
# Install dependencies
make install # or: uv sync
# Development server (with hot reload)
make dev # or: uv run uvicorn omni_agent.main:app --reload --host 0.0.0.0 --port 8000
# Run tests
make test # or: uv run pytest -v
make test-cov # Run tests with coverage
# Code quality
make lint # Check code with ruff
make lint-fix # Auto-fix linting issues
make format # Format code with ruff
make check # Run all checks (lint, format, type)
# Single test file
uv run pytest tests/core/test_agent.py -v
# Specific test function
uv run pytest tests/core/test_agent.py::test_function_name -v- Always use
uv runinstead of directpythonwhen running scripts - The project uses Python 3.11+ (required)
- Source code is in
src/omni_agent/, not root level
1. Agent Execution Loop (src/omni_agent/core/agent.py)
- Manages the complete AI agent lifecycle
- Integrates TokenManager for context management (prevents overflow at 120k tokens)
- Integrates AgentLogger for structured JSON logging
- Executes multi-step tasks with tool calls until completion or max_steps reached
2. Token Management (src/omni_agent/core/token_manager.py)
- Uses tiktoken (cl100k_base) for precise token counting
- Automatically summarizes message history when exceeding token_limit
- Summarization strategy: keeps all user messages, compresses agent execution rounds
- Can reduce token usage by 50-70% while preserving context
3. Structured Logging (src/omni_agent/core/agent_logger.py)
- Creates timestamped log files:
~/.omni-agent/log/agent_run_YYYYMMDD_HHMMSS.log - Logs: STEP (token usage), REQUEST, RESPONSE, TOOL_EXECUTION (with timing), COMPLETION
- Critical for debugging agent behavior and performance analysis
4. MCP Integration (src/omni_agent/services/mcp_manager.py, src/omni_agent/tools/mcp_loader.py)
- Loads external tools via Model Context Protocol at startup
- Configuration in
mcp.json(supports stdio, SSE, HTTP transports) - Tools stored globally in
api/deps.pyand injected into agent - Important: MCP tools must be loaded during FastAPI lifespan startup, not per-request
HTTP Request → FastAPI Router (api/v1/agent.py)
↓
Dependency Injection (api/deps.py)
- get_llm_client()
- get_tools() [includes MCP + base + skills]
↓
Agent.run() (core/agent.py)
- TokenManager checks/summarizes
- AgentLogger records each step
- LLM generates response
- Tools execute (with timing)
- Loop until complete or max_steps
↓
Return response with execution_logs
Tool Loading Priority (in api/deps.py:get_tools()):
- Base tools: ReadTool, WriteTool, EditTool, BashTool
- MCP tools: Loaded at startup via
initialize_mcp_tools() - Skills: Dynamically loaded via SkillTool
Adding New Tools:
- Create tool class inheriting from
Toolbase class (tools/base.py) - Implement:
name,description,parameters,execute()method - Register in
api/deps.py:get_tools()function - Tools are automatically exposed in OpenAPI schema
Uses pydantic-settings with .env file support (core/config.py):
Critical Settings:
ENABLE_MCP=true: Must be true for MCP tools to loadENABLE_SKILLS=true: Must be true for Skills systemENABLE_RAG=true: Must be true for RAG knowledge baseENABLE_SPAWN_AGENT=true: Must be true for spawn_agent toolMCP_CONFIG_PATH=mcp.json: Path to MCP configurationLLM_API_KEY: Required for LLM callsLLM_MODEL: Model name inprovider/modelformat (e.g.,anthropic/claude-3-5-sonnet-20241022)AGENT_MAX_STEPS=50: Prevents infinite loops- Token management is always enabled (120k default limit)
SpawnAgent Settings:
SPAWN_AGENT_MAX_DEPTH=3: Maximum nesting depth for spawned agentsSPAWN_AGENT_DEFAULT_MAX_STEPS=15: Default max steps for child agentsSPAWN_AGENT_TOKEN_LIMIT=50000: Token limit for child agents
ACP Settings (Agent Client Protocol):
ENABLE_ACP=true: Enable ACP protocol endpoints for code editor integration
Sandbox Settings (when ENABLE_SANDBOX=true):
SANDBOX_URL=http://localhost:8080: agent-sandbox server URLSANDBOX_AUTO_START=false: Auto-start Docker container if not runningSANDBOX_DOCKER_IMAGE=ghcr.io/agent-infra/sandbox:latest: Docker imageSANDBOX_TTL_SECONDS=3600: Sandbox instance TTL (1 hour default)SANDBOX_MAX_INSTANCES=100: Maximum concurrent sandbox instances
Session Settings:
SESSION_BACKEND=file: Storage backend (file, redis, postgres)SESSION_MAX_AGE_DAYS=7: Session expirationSESSION_HISTORY_RUNS=3: Number of previous runs to include in context
RAG Settings:
DASHSCOPE_API_KEY: Required for embedding generationEMBEDDING_MODEL=text-embedding-v4: Embedding modelCHUNK_SIZE=500: Document chunk sizeRAG_TOP_K=5: Number of search results
Environment Variables vs Config:
.envis primary configuration source- Settings are validated and processed in
Settingsclass - Access via global
settingsinstance
Location: src/omni_agent/skills/ (internal) and ./skills/ (external)
Architecture:
- Each skill is a directory with
SKILL.mdfile SkillTool(skills/skill_tool.py) loads skills on-demand viaget_skilltool- Skills provide expert guidance (not executable code)
- System prompt automatically includes skill metadata when
ENABLE_SKILLS=true
Skill Structure:
skills/my-skill/
├── SKILL.md # Main skill content (loaded by get_skill tool)
└── reference/ # Optional supporting docs
File: mcp.json (JSON schema: https://modelcontextprotocol.io/schema/mcp.json)
Structure:
{
"mcpServers": {
"server-name": {
"command": "npx", // or python, node, etc.
"args": ["-y", "package-name"],
"env": {"API_KEY": "value"}, // Environment variables for server
"disabled": false // Set true to disable without removing
}
}
}Common Issue: If MCP tools don't load:
- Check
ENABLE_MCP=truein.env - Verify
mcp.jsonexists and is valid JSON - Check startup logs for MCP initialization messages
- Debug logs written to
/tmp/mcp_init_debug.log
Location: src/omni_agent/acp/
Implements Zed Agent Client Protocol for code editor integration.
Architecture:
schemas.py: JSON-RPC 2.0, Session, ToolCall, ContentBlock data modelsadapter.py: Converts between ACP and internal message formatsapi/v1/endpoints/acp.py: HTTP endpoints for ACP protocol
Endpoints (when ENABLE_ACP=true):
POST /api/v1/acp/agent/initialize: Initialize connection, negotiate capabilitiesPOST /api/v1/acp/session/new: Create new sessionPOST /api/v1/acp/session/prompt: Process user prompt (sync)POST /api/v1/acp/session/prompt/stream: Process user prompt (streaming SSE)POST /api/v1/acp/session/cancel: Cancel session operation
Session Updates (streaming events):
agent_thought_chunk: LLM thinking processagent_message_chunk: Agent response contenttool_call: Tool invocation starttool_call_update: Tool execution status/resultplan: Execution plan (TODO list)
System supports 100+ LLM providers via LiteLLM with automatic parameter adaptation:
Supported Providers:
- Anthropic:
anthropic/claude-3-5-sonnet-20241022(max_tokens: 8192) - OpenAI:
openai/gpt-4o(max_tokens: 16384) - xAI:
xai/grok-4-fast-reasoning(max_tokens: 16384) - DeepSeek:
deepseek/deepseek-chat(max_tokens: 8192) - Qwen:
qwen/qwen-max(max_tokens: 8192)
Auto-Standardization: Model names are automatically converted to provider/model format
claude-3-5-sonnet-20241022 → anthropic/claude-3-5-sonnet-20241022
gpt-4o → openai/gpt-4oAuto-Adjustment: max_tokens is automatically capped to provider limits
Request 16384 → DeepSeek limit 8192 → Auto-adjusted to 8192
See docs/MODEL_STANDARDIZATION.md for detailed configuration.
spawn_agent allows parent agent to dynamically create child agents for delegated tasks:
Parameters:
task: Task description for the child agentrole: Role description (e.g., "security auditor", "test writer")context: Additional context for the child (optional)tools: List of tool names the child can use (optional)max_steps: Maximum steps for child execution (optional)
Use Cases:
- Tasks requiring specialized roles
- Breaking down complex tasks into independent subtasks
- Avoiding main agent context pollution
Configuration:
SPAWN_AGENT_MAX_DEPTH=3: Prevents infinite nesting- Child agents inherit token management and logging
- Execution tracked in TraceLogger for debugging
Team uses Leader-Member pattern for collaborative task execution (src/omni_agent/core/team.py):
Architecture:
- Leader analyzes the task
- Leader uses
delegate_task_to_membertool to delegate to members - Members execute tasks and return results
- Leader synthesizes final answer
Predefined Roles (in core/team.py):
researcher: Information search and organizationwriter: Content writing and organizationcoder: Programming and technical tasksreviewer: Quality review and feedbackanalyst: Data analysis and insights
Workflow Tracking: All team runs are logged in TraceLogger with delegation chains
MsgHub provides event-driven message broadcasting between agents (src/omni_agent/core/msghub.py):
Core Mechanism:
- Each agent registers a COMPLETION event handler via EventEmitter
- When an agent finishes speaking, the handler auto-broadcasts to all other participants
- Other agents receive messages via
observe()injection into their message history
Components:
MsgHub: Orchestrates multi-agent discussions with event-driven broadcastingMsgHubConfig: Configuration (max_rounds, max_steps_per_turn, announcement)Orchestrator: Callable that selects the next speaker (defaults to round-robin)
Agent Methods:
Agent.observe(msg): Inject external message into agent's conversation historyAgent.execute_turn(max_steps): Execute one discussion turn, preserving external event handlers
Usage:
from omni_agent.core import Agent, MsgHub, MsgHubConfig
designer = Agent(llm_client=llm, name="designer", system_prompt="UI designer")
developer = Agent(llm_client=llm, name="developer", system_prompt="Developer")
config = MsgHubConfig(max_rounds=6, max_steps_per_turn=5, announcement="Design review")
async with MsgHub([designer, developer], config=config) as hub:
result = await hub.run("Design a REST API for user auth")
hub.add(new_agent) # Dynamic participant management
hub.delete("designer")Completion: Agent responds with <hub_complete> tag to signal discussion end.
Ralph Loop is an iterative development methodology (src/omni_agent/core/ralph.py):
Core Concept: Same prompt executed repeatedly, AI sees previous work in files and iteratively improves until completion.
Components:
RalphConfig: Configuration for iterations, completion conditions, context strategyToolResultCache: Caches tool results with summaries, supports on-demand full retrievalWorkingMemory: Structured memory persisted to.ralph/memory.jsonContextManager: Coordinates summarization and iteration historyCompletionDetector: Multi-condition completion detection
Usage:
from omni_agent.core import Agent, RalphConfig
# Simple: ralph=True uses default config
agent = Agent(llm_client=llm_client, tools=tools, ralph=True)
# Custom: ralph=RalphConfig(...) for custom settings
agent = Agent(
llm_client=llm_client,
tools=tools,
ralph=RalphConfig(max_iterations=20, idle_threshold=3),
)
# Unified entry - run() auto-detects Ralph mode
result, logs = await agent.run(task="Refactor utils module")Completion Conditions:
PROMISE_TAG: Detects<promise>TASK COMPLETE</promise>tagMAX_ITERATIONS: Reached max iterationsIDLE_THRESHOLD: No file changes for N consecutive iterations
Ralph Tools (auto-injected):
get_cached_result: Retrieve full content of previous tool resultsget_working_memory: View memory summaryupdate_working_memory: Update progress, findings, todossignal_completion: Signal task completion with promise tag
Runtime cancellation mechanism for stopping long-running agent tasks (src/omni_agent/core/run_manager.py):
Architecture:
RunManager: Global singleton managing active agent runsasyncio.Event: Used for cooperative cancellation signaling- Cleanup: Incomplete messages are removed to maintain consistency
Components:
RunManager: Registers/unregisters runs, handles cancellationRunStatus: Enum (RUNNING, COMPLETED, CANCELLED, ERROR)RunInfo: Dataclass with run metadata and cancel_event
API Endpoints:
POST /api/v1/agent/cancel: Cancel by run_id or session_idGET /api/v1/agent/runs/active: List active runs
Usage:
from omni_agent.core import run_manager, RunStatus
# Register a run
cancel_event = await run_manager.register(run_id, session_id, user_id)
# Pass cancel_event to Agent
agent = Agent(llm_client=llm, tools=tools, cancel_event=cancel_event)
# Cancel from another coroutine
await run_manager.cancel(run_id)
# Cleanup after completion
await run_manager.unregister(run_id, RunStatus.COMPLETED)Frontend Integration:
// Start streaming
const response = await fetch('/api/v1/agent/run/stream', {...});
const reader = response.body.getReader();
// First event contains run_id
const firstEvent = await reader.read();
const { run_id } = JSON.parse(firstEvent.data);
// Cancel button handler
async function cancelRun() {
await fetch('/api/v1/agent/cancel', {
method: 'POST',
body: JSON.stringify({ run_id })
});
}PostgreSQL + pgvector based hybrid search system (src/omni_agent/rag/):
Architecture:
- Document processing: PDF support via pypdf, chunking with overlap
- Embedding: DashScope text-embedding-v4 (1024 dimensions)
- Storage: PostgreSQL with pgvector extension
- Search modes:
hybrid(semantic + keyword),semantic,keyword
Key Components:
rag/database.py: PostgreSQL + pgvector setuprag/embedding_service.py: Vector embedding generationrag/document_processor.py: Document chunking and processingrag/rag_service.py: Search orchestration
Tool Integration: search_knowledge tool available to agents when ENABLE_RAG=true
Test Structure:
tests/api/- API endpoint teststests/core/- Core component tests (agent, llm_client, token_manager)tests/tools/- Tool execution teststests/services/- Service layer tests
Running Tests:
# All tests
make test
# With coverage
make test-cov
# Specific module
uv run pytest tests/core/test_agent.py -v
# With output
uv run pytest tests/core/ -v -sIMPORTANT PATHS:
- Source code:
src/omni_agent/(NOTomni_agent/) - Tests:
tests/ - External skills:
./skills/ - Workspace:
./workspace/(agent file operations default here) - Logs:
~/.omni-agent/log/(agent execution logs)
Python Import Paths:
- Always import as:
from omni_agent.core import Agent - Never:
from src.omni_agent.core import Agent - The
src/is in the Python path viapyproject.tomlconfiguration
Critical Implementation Details:
- MCP Loading: Must happen in FastAPI lifespan startup, not per-request
- Token Management: Automatic, but can be configured via Agent constructor
- Logging: Automatic for all agent runs when
enable_logging=True(default) - Tools: Base tools + MCP tools + Skills all merged in
get_tools() - PYTHONPATH: When running from src/, must rename old
omni_agent/to avoid conflicts - Session Storage: Three backends available - File (dev), Redis (production), PostgreSQL (persistent)
- Model Names: Always use
provider/modelformat in LLM_MODEL setting
Base URL: http://localhost:8000
Key Endpoints:
POST /api/v1/agent/run- Execute agent with taskGET /api/v1/tools/- List all available toolsGET /health- Health checkGET /docs- OpenAPI/Swagger documentation
Agent Request Format:
{
"message": "Task description",
"workspace_dir": "./workspace", // optional
"max_steps": 50 // optional
}Response includes:
success: booleanmessage: final agent responsesteps: number of steps takenlogs: array of execution logs with token usage and timing
Located at ~/.omni-agent/log/agent_run_YYYYMMDD_HHMMSS.log
Log Events:
STEP: Step number, token usage statistics, percentageREQUEST: User message or tool call requestRESPONSE: LLM response with thinking processTOOL_EXECUTION: Tool name, parameters, result, execution time (ms)COMPLETION: Final agent message, total steps, summary
Viewing Logs:
# List recent logs
ls -lht ~/.omni-agents/log/ | head -5
# View specific run
cat ~/.omni-agents/log/agent_run_20251113_223233.logLocated at ~/.omni-agent/traces/trace_*.jsonl
Use trace_viewer tool for analysis:
# List all traces
uv run python -m omni_agent.utils.trace_viewer list
# View detailed trace
uv run python -m omni_agent.utils.trace_viewer view trace_team_20251205_abc123.jsonl
# Visualize workflow dependencies
uv run python -m omni_agent.utils.trace_viewer flow trace_dependency_workflow_20251205_xyz789.jsonlTrace Information:
- Workflow lifecycle events (start, end)
- Agent spawn/complete events with nesting depth
- Leader → Member delegation chains
- Task dependency relationships
- Token usage aggregation
- Execution hierarchy visualization
See docs/TRACING_GUIDE.md for detailed usage.
Located at src/omni_agent/eval/, with test cases in evals/:
uv run python -m omni_agent.eval --tags quick # Run quick evals
uv run python -m omni_agent.eval --dataset evals/safety # Run specific categoryEval Categories: evals/tool_usage/, evals/multi_step/, evals/code_generation/, evals/reasoning/, evals/safety/, evals/efficiency/ (61 total cases)
Key Components:
eval/runner.py: EvalRunner with IsolatedWorkspace per caseeval/grader.py: OutcomeGrader with checks:result_contains,result_matches,file_exists,file_contains,file_not_contains,file_matcheseval/isolation.py: Temp workspace with file/dir setup (supports both text and binary files)eval/report.py: EvalReport with terminal output and JSON export
Located at src/omni_agent/eval/benchmarks/:
uv run python -m omni_agent.eval.benchmarks bfcl --categories simple --max-cases 20
uv run python -m omni_agent.eval.benchmarks gaia --levels 1 --max-cases 10
uv run python -m omni_agent.eval.benchmarks all --output eval_resultsBFCL (benchmarks/bfcl.py): Downloads from HuggingFace, direct LLM call, AST-based function name matching
GAIA (benchmarks/gaia.py): Loads from HuggingFace datasets, full agent loop with MCP tools auto-loaded, official scorer (exact match with normalization)
LLMClient Thinking Mode: LLMClient(thinking=True, thinking_budget=8000) enables extended thinking. CLI flag: --thinking
- Old Directory Conflicts: If
omni_agent/exists at root, rename it (should only besrc/omni_agent/) - MCP Not Loading: Check
ENABLE_MCP=trueand verifyinitialize_mcp_tools()is called in lifespan - Import Errors: Use
uv runprefix, ensurePYTHONPATHincludessrc/if running directly - Token Overflow: Already handled automatically by TokenManager, but configurable via
token_limitparameter - Missing Logs: Check
~/.omni-agent/log/directory, ensureenable_logging=Truein Agent constructor - RAG Search Fails: Verify PostgreSQL + pgvector is installed and
DASHSCOPE_API_KEYis set - Model Not Found: Ensure
LLM_MODELusesprovider/modelformat (e.g.,anthropic/claude-3-5-sonnet-20241022) - SpawnAgent Depth Error: Child agents hitting
SPAWN_AGENT_MAX_DEPTHlimit - increase or redesign task delegation
README.md- Complete user guide with examplesIMPLEMENTATION_SUMMARY.md- Detailed implementation notes for Token management, AgentLogger, MCPQUICKSTART.md- Quick setup guidedocs/STREAMING.md- Streaming output feature documentationdocs/TRACING_GUIDE.md- TraceLogger and trace_viewer usage guidedocs/MODEL_STANDARDIZATION.md- Multi-provider LLM configurationdocs/OPENROUTER.md- OpenRouter integration guidedocs/CURL_EXAMPLES.md- API request examples- API docs available at
/docswhen server is running
Development specifications in harness/ directory, read before making changes:
harness/ARCHITECTURE.md- System layering, component relationships, data flowharness/RULES.md- Code style, naming, imports, Git workflowharness/TESTING.md- Test pyramid, fixtures, mock strategies, coverage requirementsharness/CI.md- CI/CD pipeline stages and configurationharness/STRUCTURE.md- Directory tree, module placement rulesharness/VALIDATION.md- Pre-commit, CI, deployment checklistsharness/AGENT_BEHAVIOR.md- Execution loop, tool constraints, multi-agent rulesharness/TOOL_DEVELOPMENT.md- Tool base class contract, parameter schema design, error handling, testingharness/PROMPT_ENGINEERING.md- System prompt structure, XML tags, role/instruction writing, skills progressive disclosureharness/MULTI_AGENT_PATTERNS.md- Team/MsgHub/Ralph/SpawnAgent selection guide and anti-patternsharness/OBSERVABILITY.md- AgentLogger, TraceLogger, EventEmitter, debugging methodsharness/EVALUATION.md- Eval case format, grading system, metrics, custom gradersharness/SECURITY.md- Sandbox isolation, prompt injection defense, file access control, key management
Base Tools (always available):
read_file: Read file contents with optional offset/limitwrite_file: Write content to fileedit_file: Edit file via string replacementbash: Execute shell commands with timeout
Session Tools (when ENABLE_SESSION=true):
session_note: Store session memoryrecall_note: Query session memory
Skill Tools (when ENABLE_SKILLS=true):
get_skill: Load expert skill guidance on-demand
SpawnAgent Tool (when ENABLE_SPAWN_AGENT=true):
spawn_agent: Create child agent for delegated tasks
RAG Tools (when ENABLE_RAG=true):
search_knowledge: Search knowledge base with hybrid/semantic/keyword modes
Team Tools (only in Team endpoints):
delegate_task_to_member: Leader delegates task to specific memberbroadcast_task: Leader broadcasts task to all members
MCP Tools (when ENABLE_MCP=true):
- Dynamically loaded from
mcp.jsonconfiguration - Examples:
web_search_exa, filesystem tools, database tools, etc.