Enterprise-Grade Serverless AI Agent Platform
Build production-ready AI agents with complete flexibility in LLM providers and tools, backed by a comprehensive management UI for enterprise operations.
The Step Functions AI Agent Framework consists of two integrated components:
A serverless, highly flexible agent execution platform that provides:
- Any LLM Provider: Anthropic Claude, OpenAI GPT, Google Gemini, Amazon Bedrock, xAI Grok, DeepSeek
- Any Programming Language: Build tools in Python, TypeScript, Rust, Go, Java, or any language
- Serverless Scale: Automatic scaling with AWS Step Functions orchestration
- Complete Observability: Full tracing, metrics, and cost tracking built-in
A comprehensive admin interface for enterprise operations:
- Agent Management: Configure agents, assign tools, update LLM models
- Tool Registry: Manage and test tools across all agents
- Execution Monitoring: Real-time execution history with filtering and search
- Cost Analytics: Track usage and costs by agent, model, and time period
- Enterprise Security: IAM-integrated access, secret management, audit logging
- ✅ Multi-Provider LLM Support - Switch providers without code changes
- ✅ Unified Rust LLM Service - High-performance, provider-agnostic interface
- ✅ Language-Agnostic Tools - Build tools in any language
- ✅ Human-in-the-Loop - Built-in approval workflows
- ✅ Modular Architecture - Shared infrastructure, reusable tools
- ✅ Long Content Support - Handle extensive documents and conversations
- 📊 Execution Dashboard - Fast, indexed execution history with date/agent filtering
- 🔧 Agent Configuration - Dynamic system prompts, model selection, tool assignment
- 🧪 Integrated Testing - Test agents and tools directly from the UI
- 📈 Metrics & Analytics - CloudWatch integration, token usage, cost tracking
- 🔐 Enterprise Security - Cognito authentication, IAM permissions, secret manager
- 🚀 Real-time Updates - EventBridge-powered execution tracking
graph TB
subgraph UI["Management UI (Amplify)"]
Console[Admin Console]
ExecutionHistory[Execution History]
Analytics[Analytics Dashboard]
end
subgraph Registry["Registries (DynamoDB)"]
AgentReg[Agent Registry]
ToolReg[Tool Registry]
ModelReg[Model Registry]
end
subgraph Runtime["Agent Runtime"]
StepFunctions[Step Functions]
LLMService[LLM Service]
Tools[Tool Lambdas]
end
Console --> AgentReg
Console --> ToolReg
StepFunctions --> LLMService
StepFunctions --> Tools
StepFunctions --> AgentReg
StepFunctions --> ToolReg
ExecutionHistory --> StepFunctions
stateDiagram-v2
[*] --> LoadConfig: Start Execution
LoadConfig --> LoadTools: Load from Registry
LoadTools --> CallLLM: Get Tool Definitions
CallLLM --> UpdateMetrics: LLM Response
UpdateMetrics --> CheckTools: Record Usage
CheckTools --> ExecuteTools: Tools Requested
CheckTools --> Success: No Tools Needed
ExecuteTools --> CallLLM: Return Results
Success --> [*]: Complete
- AWS Account with appropriate permissions
- Python 3.12+
- Node.js 18+ (for CDK and Amplify UI)
- AWS CDK CLI:
npm install -g aws-cdk
- UV for Python:
pip install uv
# Clone the repository
git clone https://github.com/your-org/step-functions-agent.git
cd step-functions-agent
# Install Python dependencies
uv pip install -r requirements.txt
# Bootstrap CDK (first time only)
cdk bootstrap
# Set environment
export ENVIRONMENT=prod
# 1. Deploy shared infrastructure (once per environment)
cdk deploy SharedInfrastructureStack-prod
cdk deploy AgentRegistryStack-prod
# 2. Deploy LLM service (choose one)
cdk deploy SharedUnifiedRustLLMStack-prod # Recommended: High-performance unified service
# 3. Configure API keys in AWS Secrets Manager
aws secretsmanager create-secret \
--name /ai-agent/llm-secrets/prod \
--secret-string '{
"ANTHROPIC_API_KEY": "sk-ant-...",
"OPENAI_API_KEY": "sk-...",
"GEMINI_API_KEY": "..."
}'
cd ui_amplify
# Install dependencies
npm install
# Deploy to Amplify (creates hosted UI)
npx ampx sandbox # For development
# OR
npx ampx pipeline-deploy --branch main # For production
The UI will be available at your Amplify app URL (e.g., https://main.xxxx.amplifyapp.com
)
Create a new file stacks/agents/my_agent_stack.py
:
from aws_cdk import Fn
from stacks.agents.modular_base_agent_unified_llm_stack import ModularBaseAgentUnifiedLLMStack
class MyAgentStack(ModularBaseAgentUnifiedLLMStack):
def __init__(self, scope, construct_id, env_name="prod", **kwargs):
# Import required tools from registry
db_tool_arn = Fn.import_value(f"DBInterfaceToolLambdaArn-{env_name}")
# Configure tools for this agent
tool_configs = [
{
"tool_name": "query_database",
"lambda_arn": db_tool_arn,
"requires_activity": False
}
]
# Define agent behavior
system_prompt = """You are a data analyst assistant.
Help users query and analyze database information.
Always explain your findings clearly."""
# Initialize agent with Unified LLM
super().__init__(
scope, construct_id,
agent_name="data-analyst",
unified_llm_arn=Fn.import_value(f"SharedUnifiedRustLLMLambdaArn-{env_name}"),
tool_configs=tool_configs,
env_name=env_name,
system_prompt=system_prompt,
**kwargs
)
Add to app.py
:
from stacks.agents.my_agent_stack import MyAgentStack
# Deploy your agent
MyAgentStack(app, "DataAnalystAgentStack-prod", env_name="prod")
cdk deploy DataAnalystAgentStack-prod
The agent will automatically register in the Agent Registry and appear in the Management UI!
lambda/tools/my-tool/
├── index.py # Lambda handler
├── requirements.txt # Dependencies
└── tool_definition.json # Tool schema for LLM
def lambda_handler(event, context):
"""
Standard tool interface compatible with all LLM providers
Args:
event: {
"name": "tool_name",
"id": "unique_tool_use_id",
"input": {
# Tool-specific parameters
}
}
Returns:
{
"type": "tool_result",
"tool_use_id": event["id"],
"name": event["name"],
"content": "Result as string or JSON"
}
"""
tool_input = event["input"]
# Implement tool logic
result = perform_action(tool_input)
return {
"type": "tool_result",
"tool_use_id": event["id"],
"name": event["name"],
"content": result
}
Create tool_definition.json
:
{
"name": "my_tool",
"description": "Clear description of what the tool does for the LLM",
"input_schema": {
"type": "object",
"properties": {
"parameter1": {
"type": "string",
"description": "Description of parameter1"
},
"parameter2": {
"type": "number",
"description": "Description of parameter2"
}
},
"required": ["parameter1"]
}
}
from aws_cdk import aws_lambda as lambda_, Duration
from constructs import Construct
from .base_tool_stack import BaseToolStack
class MyToolStack(BaseToolStack):
def __init__(self, scope: Construct, construct_id: str, env_name: str = "prod", **kwargs):
super().__init__(scope, construct_id, env_name=env_name, **kwargs)
# Create Lambda function
tool_lambda = lambda_.Function(
self, "MyToolFunction",
runtime=lambda_.Runtime.PYTHON_3_12,
handler="index.lambda_handler",
code=lambda_.Code.from_asset("lambda/tools/my-tool"),
timeout=Duration.seconds(30),
environment={
"LOG_LEVEL": "INFO"
}
)
# Register in Tool Registry
self.register_tool(
tool_name="my_tool",
tool_lambda=tool_lambda,
tool_definition_path="lambda/tools/my-tool/tool_definition.json"
)
cdk deploy MyToolStack-prod
The tool is now available for any agent to use!
The framework includes production-ready tools you can use immediately:
- SQL Database Tool (
DBInterfaceToolStack
) - Query databases, execute SQL, analyze data - GraphQL Tool (
GraphQLToolStack
) - Query GraphQL APIs with type safety - Web Research Tool (
WebResearchToolStack
) - Web scraping and research
- Microsoft Graph Tool (
MicrosoftGraphToolStack
) - Office 365, Teams, SharePoint integration - Google Maps Tool (
GoogleMapsToolStack
) - Location services, geocoding, directions - Firecrawl Tool - Advanced web scraping with AI
- Code Execution Tool (
E2BToolStack
) - Safe Python/JavaScript code execution - Batch Processor Tool - Process large datasets in parallel
- Local Agent Tool - Execute commands on remote machines securely
- CloudWatch Tool (
CloudWatchToolStack
) - AWS metrics, logs, and alarms - Sagemaker Tool - ML model deployment and inference
Deploy any tool:
cdk deploy DBInterfaceToolStack-prod
cdk deploy GoogleMapsToolStack-prod
- Fast Indexed Search - DynamoDB-backed execution index for instant queries
- Advanced Filtering - Filter by agent, status, date range (UTC-aware)
- Real-time Updates - EventBridge integration for live execution tracking
- Detailed Views - Full execution trace, token usage, cost breakdown
- Dynamic Configuration - Update system prompts without redeployment
- Model Selection - Switch LLM providers and models on the fly
- Tool Assignment - Add/remove tools from agents via UI
- Version Control - Track configuration changes over time
- Agent Testing - Execute test prompts with custom inputs
- Tool Testing - Validate tool functionality independently
- Execution Replay - Re-run failed executions with same inputs
- Health Checks - Automated validation of agent configurations
- Cost Tracking - Real-time cost estimates per execution
- Token Usage - Input/output token metrics by model
- Performance Metrics - Execution duration, error rates, trends
- CloudWatch Integration - Deep-dive into logs and traces
- ✅ IAM Integration - Fine-grained access control with AWS IAM
- ✅ Cognito Authentication - Secure user authentication for UI
- ✅ Secrets Manager - Encrypted storage for API keys and credentials
- ✅ VPC Support - Deploy in private subnets with VPC endpoints
- ✅ Audit Logging - Complete audit trail via CloudWatch and CloudTrail
- ✅ Resource Tags - Automatic tagging for compliance and cost allocation
- ✅ X-Ray Tracing - End-to-end distributed tracing
- ✅ CloudWatch Metrics - Custom metrics for all operations
- ✅ Structured Logging - JSON logs with correlation IDs
- ✅ Execution Index - Fast searchable execution history
- ✅ Cost Attribution - Track costs by agent, model, and execution
- ✅ Automatic Retries - Built-in retry logic with exponential backoff
- ✅ Error Handling - Graceful degradation and error recovery
- ✅ Circuit Breakers - Protect downstream services
- ✅ Rate Limiting - Prevent API quota exhaustion
- ✅ Health Checks - Automated monitoring and alerting
- ✅ Token Tracking - Real-time token usage monitoring
- ✅ Cost Estimation - Predict execution costs before running
- ✅ Budget Alerts - CloudWatch alarms for cost thresholds
- ✅ Model Optimization - Automatic model selection for cost/quality trade-offs
- ✅ Execution Limits - Configurable limits per agent
Provider | Models | Best For | Pricing |
---|---|---|---|
Anthropic Claude | Sonnet 4, Opus 3.5 | Complex reasoning, long context | $$$ |
OpenAI | GPT-4o, GPT-4o-mini | Versatile, code generation | $$$ |
Google Gemini | 1.5 Pro, Flash | Multimodal, fast responses | $$ |
Amazon Bedrock | Nova Pro, Nova Lite | AWS native, cost-effective | $$ |
xAI | Grok 2, Grok 2 mini | Latest capabilities | $$ |
DeepSeek | DeepSeek V3 | Specialized tasks | $ |
All providers are configured through the Unified Rust LLM Service or individual provider Lambdas. API keys are stored in AWS Secrets Manager.
Update API keys:
aws secretsmanager update-secret \
--secret-id /ai-agent/llm-secrets/prod \
--secret-string '{
"ANTHROPIC_API_KEY": "sk-ant-new-key",
"OPENAI_API_KEY": "sk-new-key"
}'
Change models via Management UI or agent configuration:
# In agent stack
self.llm_provider = "anthropic"
self.llm_model = "claude-sonnet-4-20250514"
# Or via UI: Agent Management > Select Agent > Update Model
# Development environment
export ENVIRONMENT=dev
cdk deploy SharedInfrastructureStack-dev
cdk deploy MyAgentStack-dev
# Production environment
export ENVIRONMENT=prod
cdk deploy SharedInfrastructureStack-prod
cdk deploy MyAgentStack-prod
-
Core Infrastructure (once per environment)
cdk deploy SharedInfrastructureStack-prod cdk deploy AgentRegistryStack-prod
-
LLM Service (choose based on needs)
# High-performance unified service (recommended) cdk deploy SharedUnifiedRustLLMStack-prod # OR traditional multi-provider cdk deploy SharedLLMStack-prod
-
Tools (deploy only what you need)
cdk deploy DBInterfaceToolStack-prod cdk deploy GoogleMapsToolStack-prod cdk deploy WebResearchToolStack-prod
-
Agents (your custom agents)
cdk deploy MyAgentStack-prod
-
Management UI (Amplify)
cd ui_amplify npx ampx pipeline-deploy --branch main
Access pre-built dashboards:
- Execution Overview - All agent executions, success rates, duration
- Cost Analysis - Token usage and estimated costs by model
- Error Tracking - Failed executions, error patterns, retry metrics
-- Cost analysis by agent
fields @timestamp, agent_name, model, input_tokens, output_tokens
| stats sum(input_tokens * 0.003 / 1000) as input_cost,
sum(output_tokens * 0.015 / 1000) as output_cost
by agent_name, model
-- Execution performance
fields @timestamp, agent_name, duration
| stats avg(duration) as avg_duration,
max(duration) as max_duration,
count() as total_executions
by agent_name
Configure CloudWatch Alarms:
- High error rate (>5% failures)
- Slow executions (>30s duration)
- High costs (>$100/day)
- Token limit warnings
- Deployment Guide - Complete deployment walkthrough
- Quick Start Tutorial - Build your first agent in 10 minutes
- Agent Development - Creating custom agents
- Tool Development - Building new tools
- Testing Guide - Testing strategies
- Modular Architecture - System design patterns
- Long Content Support - Handling large documents
- Human Approval Workflows - Adding approval steps
- Activity Testing - Testing remote activities
- Monitoring Guide - Observability setup
- Security Best Practices - Security hardening
- Cost Optimization - Reducing operational costs
- Troubleshooting - Common issues and solutions
- UI User Guide - Using the admin interface
- Execution Index - Fast execution queries
- Analytics Dashboard - Using metrics and analytics
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
# Create virtual environment
uv venv
source .venv/bin/activate
# Install dev dependencies
uv pip install -r requirements-dev.txt
# Run tests
pytest
# Format code
black .
ruff check .
cd ui_amplify
# Install dependencies
npm install
# Run local development server
npm run dev
# Run tests
npm test
step-functions-agent/
├── app.py # CDK app entry point
├── stacks/
│ ├── agents/ # Agent stack definitions
│ ├── tools/ # Tool stack definitions
│ ├── shared_llm/ # LLM service stacks
│ └── infrastructure/ # Core infrastructure
├── lambda/
│ ├── tools/ # Tool Lambda functions
│ │ ├── db-interface/
│ │ ├── google-maps/
│ │ └── web-research/
│ └── unified_llm/ # Unified LLM service (Rust)
├── ui_amplify/ # Management UI (Amplify Gen 2)
│ ├── amplify/ # Backend configuration
│ ├── src/ # React frontend
│ └── scripts/ # Utility scripts
└── docs/ # Documentation
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs.your-project.com
This project is licensed under the MIT License - see LICENSE for details.
- AWS Step Functions team for serverless orchestration
- Anthropic, OpenAI, Google, Amazon, xAI, and DeepSeek for LLM APIs
- AWS Amplify team for the Gen 2 framework
- Open-source community for tools and libraries
Built with ❤️ using AWS CDK, Step Functions, and Amplify