Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
16 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/design/INTERACTIVE_AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -745,7 +745,7 @@ Opt-in per task: 4 KB previews + full trajectory to S3 with TTL.

### Deferred

- **LLM-synthesized status summary** — `bgagent ask` without targeting the agent; Lambda calls an LLM to narrate state. Cost + hallucination trade-offs; revisit if v1 feedback warrants.
- **LLM-synthesized status summary** — `bgagent ask` without targeting the agent; Lambda calls an LLM to narrate state. Cost + hallucination trade-offs; revisit if v1 feedback warrants. Tracked on the product roadmap as **LLM-synthesized status summary (optional)** under **Smart progress updates** ([ROADMAP.md](../guides/ROADMAP.md)).
- **Cedar `effect: "advise"` tier** — non-blocking FYI policy tier for post-v1. Design sketch in [`CEDAR_HITL_GATES.md`](./CEDAR_HITL_GATES.md).
- **Outbound WebSocket from agent** — only if a concrete sub-200 ms latency requirement surfaces. Agent-initiated egress avoids dual-auth problems and works on any compute.
- **Multi-user watch** — multiple users attached to the same task's live event stream (teams).
Expand Down
33 changes: 22 additions & 11 deletions docs/guides/ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ What's shipped and what's coming next.
### Core platform

- [x] **Autonomous agent execution** - Isolated MicroVM (AgentCore Runtime) per task with shell, filesystem, and git access
- [x] **CLI and REST API** - Submit, list, get, cancel, nudge, watch, trace, webhook management; view audit events; Cognito auth with token caching
- [x] **CLI and REST API** - Submit, list, get, cancel, nudge, watch, trace, status, webhook management; Cedar HITL (`approve`, `deny`, `pending`, `policies`); Slack and Linear workspace setup; view audit events; Cognito auth with token caching
- [x] **Durable orchestrator** - Lambda Durable Functions with checkpoint/resume; survives transient failures up to 9 hours
- [x] **Task state machine** - SUBMITTED → HYDRATING → RUNNING → COMPLETED / FAILED / CANCELLED / TIMED_OUT
- [x] **Task state machine** - SUBMITTED → HYDRATING → RUNNING → AWAITING_APPROVAL (Cedar HITL) → FINALIZING → COMPLETED / FAILED / CANCELLED / TIMED_OUT
- [x] **Concurrency control** - Per-user limits (default 3) with atomic admission and automated drift reconciliation
- [x] **Stranded task reconciler** - Scheduled Lambda detects tasks stuck in non-terminal states and drives them to failure with proper cleanup
- [x] **Idempotency** - `Idempotency-Key` header on POST requests (24-hour TTL)
Expand All @@ -33,7 +33,7 @@ What's shipped and what's coming next.
- [x] **Input guardrails** - Bedrock Guardrails screen task descriptions and PR/issue content (fail-closed)
- [x] **Output screening** - Regex-based secret/PII scanner with PostToolUse hook redaction
- [x] **Content sanitization** - HTML stripping, injection pattern neutralization, control character removal
- [x] **Cedar policy engine** - Tool-call governance with fail-closed default and per-repo custom policies
- [x] **Cedar policy engine and HITL gates** - Tool-call governance (allow / hard-deny / soft-deny requiring approval) with fail-closed default, per-repo Cedar policies, `AWAITING_APPROVAL` state, `bgagent approve` / `deny` / `pending` / `policies`, and REST approval APIs. See [CEDAR_HITL_GATES.md](../design/CEDAR_HITL_GATES.md)
- [x] **WAF** - Managed rule groups + rate-based rule (1,000 req/5 min/IP)
- [x] **Pre-flight checks** - GitHub API reachability, repo access, token permissions (fail-closed)
- [x] **Model invocation logging** - Full prompt/response audit trail (90-day retention)
Expand All @@ -48,6 +48,7 @@ What's shipped and what's coming next.

- [x] **Rich prompt assembly** - Task description + GitHub issue/PR content + memory context (~100K token budget)
- [x] **Token budget management** - Oldest comments trimmed first; title/body always preserved
- [x] **Task attachments (multimodal)** - `attachments` on create-task: inline base64 (≤ 500 KB), presigned upload (up to 10 MB), and URL fetch with SSRF protection. Images (PNG, JPEG) and text files (TXT, CSV, MD, JSON, PDF, LOG) pass through Guardrail screening, magic-bytes validation, and re-encoding. CLI `--attachment`, Slack file uploads, and Linear image extraction share the same schema. See [ATTACHMENTS.md](../design/ATTACHMENTS.md)

### Webhooks

Expand All @@ -65,17 +66,22 @@ What's shipped and what's coming next.
- [x] **Real-time watch** - `bgagent watch` streams progress events with adaptive polling (500 ms active; 1/2/5 s idle backoff), cold-start retry, clean exit on terminal state
- [x] **Mid-run steering (nudge)** - `bgagent nudge` sends guidance to a running agent; combined-turn acknowledgement (agent emits `nudge_acknowledged` before incorporating)
- [x] **Execution tracing** - `--trace` on submit raises preview cap to 4 KB and uploads full gzipped NDJSON trajectory to S3; `bgagent trace download` retrieves it
- [x] **Deterministic status snapshot** - `bgagent status` derives all fields from task record + recent events with no LLM in the loop
- [x] **Deterministic status snapshot** - `bgagent status` shows operational fields (turn, last milestone, current tool/turn, cost) from the task record + recent events with no LLM in the loop—suited to ops/debug, not a narrative manager-style report (see **Smart progress updates** under What's next)
- [x] **Debug output** - `--verbose` flag emits full HTTP request/response on stderr for any CLI command

### Notification plane

- [x] **DDB Stream fanout** - FanOut Consumer Lambda on TaskEventsTable streams (ParallelizationFactor: 1 for per-task ordering) routes events to channel dispatchers
- [x] **GitHub edit-in-place** - Single status comment per task on the target PR, edited in place as progress events fire (phase, milestone, cost, link)
- [x] **Routable agent milestones** - Named checkpoints (`pr_created`, `nudge_acknowledged`) unwrapped against allowlist for channel filter matching
- [ ] **Slack dispatcher** - Log-only stub; pending full Slack Block Kit integration
- [x] **Slack notification dispatcher** - FanOut Block Kit messages for Slack-origin tasks (lifecycle events, threaded replies, terminal dedup, in-thread cancel). Generic fallback text for unmapped event types (e.g. some milestones); richer milestone and approval-gate rendering is follow-up work
- [ ] **Email dispatcher** - Log-only stub; pending SES integration

### Channels

- [x] **Slack integration** - @mention task submission, `bgagent slack link` / `setup`, file attachments on submit, threaded progress notifications. See [SLACK_SETUP_GUIDE.md](./SLACK_SETUP_GUIDE.md)
- [x] **Linear integration** - Label-triggered tasks, `bgagent linear setup` / `link`, progress comments on issues. See [LINEAR_SETUP_GUIDE.md](./LINEAR_SETUP_GUIDE.md)

### Observability

- [x] **OpenTelemetry** - Custom spans for pipeline phases with CloudWatch querying
Expand Down Expand Up @@ -159,15 +165,21 @@ Planned capabilities, grouped by theme. Items are independent and may ship in an
| **MCP supply-chain controls** | For extended-tier repos: pin or self-host MCP servers; keep `.mcp.json` in version control; verify tool descriptors before enablement; no dynamic tool discovery in production blueprints. Mitigates tool poisoning and rug-pull risks (`SECURITY.md`, `COMPUTE.md`). |
| **Untrusted hydration content boundaries** | Delimit external content in assembled prompts (issue/PR bodies, fetched URLs, review comments) so the model treats it as untrusted context (spotlighting-style framing). Complements Bedrock Guardrails at hydration time (`context-hydration.ts`). |

### Interactive task UX

| Capability | Description |
|------------|-------------|
| **Smart progress updates (manager-style status)** | Extend check-in beyond the shipped deterministic snapshot: human-readable progress that answers what the agent completed, what it plans next, and which decisions or blockers matter—surfaced via `bgagent status`, notification channels (GitHub/Slack/email), and the future control panel. Prefer structured agent-emitted progress events in `TaskEventsTable` (e.g. done / next / decisions / blockers) so all readers stay consistent and auditable; complement with Phase 2 **`bgagent ask`** for on-demand Q&A and an optional read-path **LLM-synthesized summary** over events (no agent turn) where cost/latency trade-offs are acceptable. Distinct from raw `watch`/`events` streams and from post-mortem **LLM-assisted trace analysis**. Design context: [INTERACTIVE_AGENTS.md](../design/INTERACTIVE_AGENTS.md). |
| **`bgagent ask` (Phase 2)** | Mid-run questions to the agent (`POST /tasks/{id}/asks`); answers durable as `status_response` events with CLI block-and-poll. Enables interactive summaries (e.g. "what changed so far?") without a separate status API. Ships as part of the interactive check-in layer in [INTERACTIVE_AGENTS.md](../design/INTERACTIVE_AGENTS.md) Phase 2. |
| **LLM-synthesized status summary (optional)** | Optional `bgagent status` mode where a Lambda narrates recent `TaskEvents` without waking the agent—deferred in design due to cost and hallucination risk; pursue behind a flag only if agent-authored progress reports are insufficient. Complements, does not replace, **Smart progress updates**. |

### Channels and integrations

| Capability | Description |
|------------|-------------|
| ~~**Task attachments (multimodal)**~~ | **Implemented.** End-to-end support for the `attachments` array: inline base64 (≤ 500 KB), presigned upload (up to 10 MB), and URL fetch with SSRF protection. Images (PNG, JPEG, GIF, WebP) and text files (TXT, CSV, MD, JSON, PDF, LOG) pass through Bedrock Guardrail screening, magic bytes validation, EXIF stripping, and re-encoding. CLI `--attachment` flag, Slack file uploads, and Linear image extraction all feed the same schema. See [ATTACHMENTS.md](../design/ATTACHMENTS.md). |
| **Additional git providers** | GitLab (and optionally Bitbucket). Same workflow, provider-specific API adapters. |
| **Slack integration** | Submit tasks, check status, receive notifications from Slack. Block Kit rendering. |
| **Control panel** | Web UI: task list, task detail with logs/traces, cancel, metrics dashboards, cost attribution. |
| **Slack notification dispatcher** | Full Slack Block Kit rendering for the existing DDB-Stream fanout pipeline. Stub exists today (logs only). |
| **Slack notification polish** | Rich Block Kit for `agent_milestone` and `approval_requested` (today many map to generic fallback text); in-thread approve/deny buttons wired to HITL APIs. Should render **Smart progress updates** when that ships. |
| **Control panel** | Web UI: task list, task detail with logs/traces, cancel, metrics dashboards, cost attribution. Task detail should show manager-style progress alongside raw events/traces. |
| **Email notification dispatcher** | SES-based email notifications via the existing fanout pipeline. Stub exists today (logs only). |
| **Per-user notification preferences** | DynamoDB (or equivalent) store for preferred channels, per-channel config, and event filters (`INPUT_GATEWAY.md`). |
| **Browser extension channel** | Lightweight extension to open tasks from GitHub issue/PR pages using existing webhook or OAuth-issued JWT; same internal message contract as other channels. |
Expand Down Expand Up @@ -219,7 +231,6 @@ Planned capabilities, grouped by theme. Items are independent and may ship in an
|------------|-------------|
| **Multi-user and teams** | Team visibility, shared approval queues, team concurrency/cost budgets, memory isolation. |
| **Agent swarm** | Planner-worker architecture for complex multi-file tasks. DAG of subtasks, merge orchestrator, one consolidated PR. Workers receive a strict subset of planner credentials; orchestrator-issued subtask intent; per-worker OpenTelemetry spans under a shared trace root (prevents confused-deputy / unscoped privilege inheritance). |
| **Cedar-driven HITL approval gates** | Three-outcome model (allow/hard-deny/soft-deny) for tool-call governance with Cedar policy engine. |
| **Multi-user nudge** | Extend `bgagent nudge` to support multiple users injecting context into the same running task. Per-nudge commit attribution. (Single-user nudge shipped.) |
| **Scheduled triggers** | Cron-based task creation via EventBridge (dependency updates, nightly flaky test checks). |

Expand Down Expand Up @@ -252,4 +263,4 @@ Planned capabilities, grouped by theme. Items are independent and may ship in an

---

Design docs to keep in sync: [ARCHITECTURE.md](../design/ARCHITECTURE.md), [ORCHESTRATOR.md](../design/ORCHESTRATOR.md), [API_CONTRACT.md](../design/API_CONTRACT.md), [INPUT_GATEWAY.md](../design/INPUT_GATEWAY.md), [REPO_ONBOARDING.md](../design/REPO_ONBOARDING.md), [MEMORY.md](../design/MEMORY.md), [OBSERVABILITY.md](../design/OBSERVABILITY.md), [COMPUTE.md](../design/COMPUTE.md), [SECURITY.md](../design/SECURITY.md), [EVALUATION.md](../design/EVALUATION.md).
Design docs to keep in sync: [ARCHITECTURE.md](../design/ARCHITECTURE.md), [ORCHESTRATOR.md](../design/ORCHESTRATOR.md), [API_CONTRACT.md](../design/API_CONTRACT.md), [ATTACHMENTS.md](../design/ATTACHMENTS.md), [CEDAR_HITL_GATES.md](../design/CEDAR_HITL_GATES.md), [INPUT_GATEWAY.md](../design/INPUT_GATEWAY.md), [INTERACTIVE_AGENTS.md](../design/INTERACTIVE_AGENTS.md), [REPO_ONBOARDING.md](../design/REPO_ONBOARDING.md), [MEMORY.md](../design/MEMORY.md), [OBSERVABILITY.md](../design/OBSERVABILITY.md), [COMPUTE.md](../design/COMPUTE.md), [SECURITY.md](../design/SECURITY.md), [EVALUATION.md](../design/EVALUATION.md).
2 changes: 1 addition & 1 deletion docs/src/content/docs/architecture/Interactive-agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -749,7 +749,7 @@ Opt-in per task: 4 KB previews + full trajectory to S3 with TTL.

### Deferred

- **LLM-synthesized status summary** — `bgagent ask` without targeting the agent; Lambda calls an LLM to narrate state. Cost + hallucination trade-offs; revisit if v1 feedback warrants.
- **LLM-synthesized status summary** — `bgagent ask` without targeting the agent; Lambda calls an LLM to narrate state. Cost + hallucination trade-offs; revisit if v1 feedback warrants. Tracked on the product roadmap as **LLM-synthesized status summary (optional)** under **Smart progress updates** ([ROADMAP.md](/roadmap/roadmap)).
- **Cedar `effect: "advise"` tier** — non-blocking FYI policy tier for post-v1. Design sketch in [`CEDAR_HITL_GATES.md`](/architecture/cedar-hitl-gates).
- **Outbound WebSocket from agent** — only if a concrete sub-200 ms latency requirement surfaces. Agent-initiated egress avoids dual-auth problems and works on any compute.
- **Multi-user watch** — multiple users attached to the same task's live event stream (teams).
Expand Down
Loading
Loading