Deploy OpenHands on AWS with production-grade infrastructure in minutes. ECS Fargate β’ Bedrock LLM β’ Per-conversation isolation β’ Self-healing architecture.
Getting Started Β· Architecture Β· Cost Estimate Β· Blog Post
Running OpenHands locally is great for trying it out. Running it for a team or in production is a different story:
| Challenge | How This Project Solves It |
|---|---|
| "I don't want to manage servers" | Fully serverless β ECS Fargate, Aurora Serverless, no EC2 instances |
| "Idle cost is too high" | Sandboxes scale to zero when not in use; pay only for active conversations |
| "Multi-user access control" | Cognito authentication with 30-day sessions, per-user conversation isolation |
| "My conversations disappear on restart" | Self-healing: Aurora + S3 + EFS persist everything across Fargate task replacements |
| "I need AWS access from the AI agent" | Optional scoped IAM credentials for sandbox containers (least-privilege) |
| "Setting up infra is painful" | One cdk deploy --all command β 10 stacks deployed in the right order automatically |
- ποΈ Fully Serverless β ECS Fargate (ARM64) for compute, Aurora Serverless v2 for database, no instances to patch
- π° Zero Idle Cost β Sandbox containers spin up per-conversation and stop automatically after idle timeout
- π Per-Conversation Isolation β Each sandbox gets a dedicated EFS access point; no cross-conversation access
- π Self-Healing Architecture β Conversations resume seamlessly after Fargate task replacement (Aurora + S3 + EFS)
- π€ AWS Bedrock β LLM inference via IAM Role, no API keys to manage
- π Multi-Domain Support β Share one backend across multiple CloudFront distributions and domains
- π Enterprise Security β Cognito auth, WAF, VPC Endpoints, private subnets, KMS encryption, Secrets Manager
- π Runtime Subdomain β Agent-built apps accessible via
{port}-{convId}.runtime.{subdomain}.{domain} - π Observability β CloudWatch Logs, Alarms, Container Insights, AWS Backup (14-day retention)
- ποΈ Warm Pool β Pre-warmed sandbox tasks for instant conversation starts
User β CloudFront (WAF+Lambda@Edge Auth) β ALB (origin verified) β ECS Fargate (App + OpenResty)
β β
βββ Cognito (OAuth2, Managed Login v2) Cloud Map β Sandbox Fargate Tasks
β
VPC Endpoints β Bedrock / CloudWatch Logs
β
RDS Proxy β Aurora Serverless v2 PostgreSQL
Sandbox Orchestration:
App β Orchestrator Lambda β DynamoDB Registry β Sandbox Fargate Tasks (per-conversation EFS isolation)
Runtime Apps:
{port}-{convId}.runtime.{subdomain}.{domain} β CloudFront β Lambda@Edge β OpenResty β Sandbox Fargate Task
π For a detailed architecture deep dive (10-stack breakdown, data flows, sandbox lifecycle), see docs/ARCHITECTURE.md.
- AWS CLI configured with appropriate credentials
- Node.js 22+ and npm
- Existing VPC with private subnets and NAT Gateway
- Existing Route 53 Hosted Zone
git clone https://github.com/zxkane/openhands-infra.git
cd openhands-infra
npm installnpx cdk bootstrap --region <your-main-region>
npx cdk bootstrap --region us-east-1 # Required for Lambda@Edge and CloudFrontaws secretsmanager create-secret \
--name openhands/sandbox-secret-key \
--secret-string "$(openssl rand -base64 32)" \
--region <your-main-region> \
--description "OpenHands sandbox secret key for session encryption"Note: This secret must exist in each region where you deploy.
npx cdk deploy --all \
--context vpcId=<vpc-id> \
--context hostedZoneId=<hosted-zone-id> \
--context domainName=<domain-name> \
--context subDomain=<subdomain> \
--context region=<region> \
--require-approval neverThat's it! Access OpenHands at https://<subdomain>.<domain-name> π
π All Context Parameters
| Parameter | Description | Example |
|---|---|---|
vpcId |
Existing VPC ID | vpc-0123456789abcdef0 |
hostedZoneId |
Route 53 Hosted Zone ID | Z0123456789ABCDEFGHIJ |
domainName |
Domain name | example.com |
subDomain |
Subdomain for OpenHands | openhands |
region |
AWS region (optional, defaults to us-east-1) | us-west-2 |
siteName |
Cognito managed login site name (optional) | Openhands on AWS |
authCallbackDomains |
Extra OAuth callback domains for shared Cognito client (optional; JSON array or comma-separated) | ["openhands.example.com","openhands.test.example.com"] |
authDomainPrefixSuffix |
Suffix for Cognito domain prefix (optional; avoids collisions) | shared |
edgeStackSuffix |
Suffix for Edge stack name in us-east-1 (optional; enables multiple Edge stacks) | my-project |
sandboxAwsAccess |
Enable sandbox AWS access (optional, defaults to false) | true |
sandboxAwsPolicyFile |
Path to custom IAM policy JSON for sandbox (optional) | config/sandbox-aws-policy.json |
skipS3Endpoint |
Skip S3 Gateway endpoint if VPC already has one (optional) | true |
warmPoolSize |
Number of pre-warmed sandbox Fargate tasks (optional, default: 2) | 3 |
idleTimeoutMinutes |
Minutes before idle sandbox is stopped (optional, default: 30, staging: 10) | 15 |
sandboxSociImageUri |
SOCI v2 image URI for Fargate lazy loading (optional, see AGENTS.md) | <ecr-uri>:tag-soci |
The project deploys 10 stacks with automatic dependency resolution:
| Stack | Region | Description |
|---|---|---|
OpenHands-Auth |
us-east-1 | Cognito User Pool + Managed Login v2 branding |
OpenHands-Network |
Main | VPC import, VPC Endpoints |
OpenHands-Monitoring |
Main | CloudWatch Logs, Alarms, S3 Data Bucket, Backup |
OpenHands-Security |
Main | IAM Roles, Security Groups, KMS key |
OpenHands-Database |
Main | Aurora Serverless v2 PostgreSQL with RDS Proxy |
OpenHands-UserConfig |
Main | User Configuration API Lambda (MCP, Secrets, Integrations) |
OpenHands-Cluster |
Main | Shared ECS Cluster + Cloud Map namespace |
OpenHands-Sandbox |
Main | Sandbox Fargate tasks, DynamoDB registry, Orchestrator Lambda |
OpenHands-Compute |
Main | Fargate services (App + OpenResty), ALB, EFS |
OpenHands-Edge-* |
us-east-1 | Lambda@Edge, CloudFront, WAF, Route 53 (per domain/environment) |
Deployment Order (handled automatically by CDK): 0. Auth β 1. Network β 2. Monitoring β 3. Security β 4. Database β 5. UserConfig β 6. Cluster β 7. Sandbox β 8. Compute β 9. Edge
| Component | Monthly Cost (USD) | Notes |
|---|---|---|
| Fargate App Service (1 vCPU / 2 GB ARM64) | ~$30 | Auto-scales 1-3 |
| Fargate OpenResty Service (0.25 vCPU / 512 MB) | ~$8 | Auto-scales 1-3 |
| Fargate Sandbox Tasks | ~$0-50 | On-demand, per-conversation |
| Aurora Serverless v2 | ~$43-80 | 0.5-4 ACU |
| RDS Proxy | ~$18 | |
| CloudFront | ~$85 | 1TB data transfer |
| VPC Endpoints (10) | ~$60 | |
| ALB | ~$25 | |
| Other (EFS, S3, NAT, CW, R53, DDB) | ~$10-50 | Usage-dependent |
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| Claude Opus 4.5 | $5 | $25 |
| Claude Sonnet 4.5 | $3 | $15 |
| Claude Haiku 4.5 | $1 | $5 |
Example: 10M input + 2M output tokens/month with Claude Sonnet 4.5 β $60/month
π Multi-Domain Deployment
You can deploy multiple OpenHands instances on different domains, all sharing the same backend infrastructure.
βββββββββββββββββββββββββββββββββββ
β AuthStack (us-east-1) β
β Shared Cognito User Pool β
β - Multi-domain callbacks β
βββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β EdgeStack-Domain1 β β EdgeStack-Domain2 β β EdgeStack-DomainN β
β (us-east-1) β β (us-east-1) β β (us-east-1) β
β - CloudFront β β - CloudFront β β - CloudFront β
β - Lambda@Edge β β - Lambda@Edge β β - Lambda@Edge β
β - WAF β β - WAF β β - WAF β
β - Route 53 records β β - Route 53 records β β - Route 53 records β
β - ACM Certificate β β - ACM Certificate β β - ACM Certificate β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β β β
βββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββ
β ComputeStack (main region) β
β - ALB with origin verification β
β - Fargate services (App+OpenResty) β
β - SSM parameters in us-east-1 β
βββββββββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββ΄βββββββββββββββββββ
βΌ βΌ
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β DatabaseStack β β MonitoringStack β
β Aurora PostgreSQL β β S3, CloudWatch β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
npx cdk deploy OpenHands-Auth \
--context vpcId=<vpc-id> \
--context hostedZoneId=<primary-hosted-zone-id> \
--context domainName=<primary-domain> \
--context subDomain=openhands \
--context region=<main-region> \
--context authCallbackDomains='["openhands.domain1.com","openhands.domain2.com"]' \
--require-approval nevernpx cdk deploy OpenHands-Network OpenHands-Monitoring OpenHands-Security \
OpenHands-Database OpenHands-UserConfig OpenHands-Cluster \
OpenHands-Sandbox OpenHands-Compute \
--context vpcId=<vpc-id> \
--context hostedZoneId=<primary-hosted-zone-id> \
--context domainName=<primary-domain> \
--context subDomain=openhands \
--context region=<main-region> \
--require-approval never# Domain 1
npx cdk deploy OpenHands-Edge-Test \
--context vpcId=<vpc-id> \
--context hostedZoneId=<hosted-zone-for-test-example-com> \
--context domainName=test.example.com \
--context subDomain=openhands \
--context region=<main-region> \
--context edgeStackSuffix=Test \
--exclusively \
--require-approval never
# Domain 2
npx cdk deploy OpenHands-Edge-Prod \
--context vpcId=<vpc-id> \
--context hostedZoneId=<hosted-zone-for-prod-example-com> \
--context domainName=prod.example.com \
--context subDomain=openhands \
--context region=<main-region> \
--context edgeStackSuffix=Prod \
--exclusively \
--require-approval neverImportant: Use --exclusively flag when deploying individual Edge stacks to avoid redeploying the backend stacks with different domain context.
Adding a new domain:
- Update Auth stack with the new callback domain
- Deploy a new Edge stack with
--context edgeStackSuffix=<Name> --exclusively
Removing a domain:
aws cloudformation delete-stack --stack-name OpenHands-Edge-<Suffix> --region us-east-1- Optionally update Auth stack to remove the callback domain
π Conversation Resume (Self-Healing)
When sandbox Fargate tasks stop (idle timeout, crash, or deployment), conversations become ARCHIVED. All data is preserved:
| Data | Storage | Survives Task Stop |
|---|---|---|
| Conversation metadata | Aurora PostgreSQL | β |
| Conversation events/history | S3 | β |
| Workspace files | EFS (per-conversation access point) | β |
Auto-Resume Flow:
User clicks archived conversation
β
Frontend detects ARCHIVED status
β
Calls POST /api/v1/app-conversations/{id}/resume
β
App β Orchestrator Lambda:
- Creates new EFS access point for conversation
- Registers new task definition with access point
- Launches Fargate sandbox task
- Updates DynamoDB registry
β
Page reloads β conversation is usable again
Workspace files on EFS are preserved via the access point, so code and files from the previous session remain available after resume.
π Sandbox AWS Access
Enable AI agents in sandbox containers to access AWS services with scoped IAM credentials:
npx cdk deploy --all \
--context sandboxAwsAccess=true \
--context sandboxAwsPolicyFile=config/sandbox-aws-policy.json \
...The default config/sandbox-aws-policy.json grants broad permissions. Customize this for your use case!
Example: Purpose-built policy for S3 and DynamoDB only:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowS3Access",
"Effect": "Allow",
"Action": ["s3:GetObject", "s3:PutObject", "s3:ListBucket"],
"Resource": ["arn:aws:s3:::my-bucket", "arn:aws:s3:::my-bucket/*"]
},
{
"Sid": "AllowDynamoDB",
"Effect": "Allow",
"Action": ["dynamodb:GetItem", "dynamodb:PutItem", "dynamodb:Query"],
"Resource": "arn:aws:dynamodb:*:*:table/my-table"
}
]
}These actions are always denied regardless of your policy:
| Category | Denied Actions |
|---|---|
| IAM Users | iam:CreateUser, iam:DeleteUser, iam:CreateAccessKey |
| IAM Policies | iam:AttachUserPolicy, iam:PutUserPolicy, iam:PutRolePolicy |
| IAM Roles | iam:CreateRole, iam:DeleteRole, iam:AttachRolePolicy |
| Account | organizations:*, account:*, billing:* |
| Role Assumption | sts:AssumeRole (prevents lateral movement) |
π Runtime Subdomain Routing
When AI agents run applications (e.g., Flask, Node.js) inside the sandbox, they are accessible via dedicated runtime subdomains:
https://{port}-{convId}.runtime.{subdomain}.{domain}/
Example: https://5000-abc123def456.runtime.openhands.example.com/
| Feature | Benefit |
|---|---|
| Domain Root | Apps run at / β internal routes work correctly |
| Cookie Isolation | Each runtime has isolated cookies |
| Security Headers | X-Frame-Options, CSP, X-XSS-Protection applied automatically |
| No Authentication | Runtime subdomains bypass Cognito (public within conversation) |
User Browser
β
https://5000-{convId}.runtime.openhands.example.com/
β
CloudFront (matches *.runtime.* wildcard certificate)
β
Lambda@Edge (viewer-request: parse subdomain, rewrite URI)
β
ALB β OpenResty β Sandbox Discovery (DynamoDB) β User App
πΎ Data Persistence
| Data Type | Storage | Persistence |
|---|---|---|
| Conversation Metadata | Aurora PostgreSQL | Permanent (via RDS Proxy) |
| Conversation Events | S3 | Permanent (survives task replacement) |
| User Settings / Secrets | S3 | Permanent (KMS envelope encryption) |
| Workspace Files | EFS | Persistent (per-conversation access points) |
| SDK Conversation Cache | EFS | Persistent (enables LLM context restoration) |
| Sandbox Registry | DynamoDB | Permanent (task state, user ownership) |
Aurora Serverless v2: PostgreSQL 15.8, RDS Proxy connection pooling, 0.5-4 ACU auto-scaling, 35-day backups.
S3 Bucket: SSE-S3 encryption, versioning (30-day retention), RETAIN removal policy.
π Security
- Fargate tasks in private subnets only
- Per-conversation EFS isolation via access points
- All AWS service access via VPC Endpoints
- IAM Roles with least privilege per service
- Database credentials in Secrets Manager
- RDS Proxy with TLS-encrypted connections
- User secrets protected by KMS envelope encryption
- Cognito authentication (30-day sessions)
- Lambda@Edge header spoofing prevention
- WAF protection with rate limiting
- S3 and Aurora storage encryption
Session Management:
| Token Type | Validity | Description |
|---|---|---|
| Access Token | 1 hour | API access token |
| ID Token | 1 day | Identity token (stored in cookie) |
| Refresh Token | 30 days | Used to obtain new tokens |
Your existing VPC must have:
- At least 2 private subnets in different AZs
- NAT Gateway for outbound internet access
- DNS hostnames enabled
| Workflow | Trigger | Description |
|---|---|---|
| CI | Push/PR to main, develop | Build TypeScript, run all tests (Jest + pytest) |
| Security Scan | Push/PR to main, daily | npm audit, Checkov, git-secrets, Semgrep SAST, cfn-lint |
npm run test # Run all tests
npm run test:ts # TypeScript tests only
npm run test:py # Python tests only
npm run test:ts -- -u # Update snapshotsnpm run build # Build TypeScript
npm run watch # Watch for changes
npx cdk diff --all # Show diff before deploy
npx cdk synth --all # Synthesize CloudFormation
npx cdk destroy --all # Destroy all stacksCommon issues
VPC Lookup Fails β Ensure the VPC exists and your AWS credentials have ec2:DescribeVpcs permission.
Certificate Validation Pending β ACM certificates use DNS validation. Ensure the Hosted Zone is correctly configured.
Fargate Task Not Starting β Check CloudWatch Logs at /openhands/application for container startup errors. Check ECS service events for Fargate capacity issues.
Contributions are welcome! Please feel free to submit issues and pull requests.
This project uses autonomous-dev-team skills for AI-assisted development with Claude Code, Kiro CLI, and Codex. Install after cloning:
npx skills add zxkane/autonomous-dev-team -s '*' -a claude-code -a kiro-cli -a codex -yOr restore from the lock file:
npx skills experimental_installThese skills enforce TDD, git worktree isolation, PR workflows, and E2E testing. See CLAUDE.md for the full workflow.
This project is licensed under the Apache License 2.0 β see the LICENSE file for details.
This infrastructure project deploys OpenHands. See the OpenHands License for the main application.