feat: Cedar HITL approval gates for agent tool use#88
Conversation
…ial findings Design doc was accidentally removed in 0742ebe; restored from b34d7cd and substantially revised under a new filename. "Phase 3" framing dropped — this is the Cedar HITL approval gates feature. - Renamed PHASE3_CEDAR_HITL.md → CEDAR_HITL_GATES.md; all "phase" gating removed (Phase 3a/3b → v1 / future work §17). - Integrated 16 findings from 2026-05-06 adversarial review with realistic scenarios. Major structural changes: - Decision aws-samples#23 (new): cross-engine parity contract between cedarpy (agent, Python) and @cedar-policy/cedar-wasm@4.10.0 (Lambda, TS). - §11.2: SlackUserMappingTable with OAuth user-initiated mapping; severity- gated Slack approvals; admin has no write path. - §7.1/§12.3: ApproveTaskFn uses cross-table TransactWriteItems for atomicity. - §10.1: user_id-status-index GSI on TaskApprovalsTable; v1 not v-later. - §15.6: cedar-wasm as a Lambda layer shared across policy Lambdas. - Gate-cap revision (2026-05-07): decision aws-samples#13 — default 50, blueprint- configurable via security.approvalGateCap (bounded 1–500), persisted on TaskTable. Cache memory bound decoupled: 50-entry LRU regardless of cap. IMPL-22 adds telemetry-driven re-evaluation criteria. - Timeout adversarial+advocate pass (2026-05-07): - §6.5 VM-throttle race fix: re-read row on failed TIMED_OUT ConditionCheckFailed; honor APPROVED if user beat the timer. IMPL-24. - Sub-120s @approval_timeout_s emits blueprint-load WARN. IMPL-25. - User-visible timeout cap milestones (approval_timeout_capped_at_submit, approval_ceiling_shrinking). IMPL-26. - Runtime JWT: no refresh logic in agent/src/ (container uses IAM role); ceiling stays min(1h, maxLifetime_remaining - 120s). IMPL-27. - Three new CloudWatch metrics for timeout tuning. IMPL-28. - §14.8 new: off-hours trade-off section (fail-closed is the invariant). - §13.13 new: notification-delivery failure does NOT pause the timer (bypass-prevention). - Added six mermaid diagrams: three-outcome decision flow, end-to-end round- trip, TaskApprovalsTable state machine, Slack user-mapping, fail-closed decision flow, cross-engine parity check. - Cross-references updated in INTERACTIVE_AGENTS.md and SECURITY.md. - Starlight mirror regenerated via docs/scripts/sync-starlight.mjs. No code changes in this commit — design work only. Implementation lands in a follow-up PR per §15.2 task list.
…ract Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md). Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines pinned exactly per decision aws-samples#23, annotation surface validated by Day-1 spikes per decision aws-samples#22, and the golden-file parity fixtures seeded so every subsequent chunk can rely on the contract. - Pin cedarpy==4.8.0 (agent) and @cedar-policy/cedar-wasm@4.10.0 (cdk) exactly (no ^/~); document both in mise.toml header. - Add agent/tests/test_cedarpy_annotations_contract.py (10 tests) validating all 5 annotations round-trip verbatim via policies_to_json_str() under staticPolicies.<id>.annotations. - Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating policySetTextToParts + policyToJson extract the same annotations verbatim and isAuthorized returns the documented {type, response} wrapper shape. - Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match, multi-match, hard-deny, soft-deny write, no-match default-allow) + README documenting the contract. Every fixture policy carries a @rule_id - including the base permit as @rule_id("base_permit") - so the parity tests raise if either engine returns an unannotated match instead of silently dropping it. - Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm side) loading the shared fixtures and asserting (decision, sorted rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm so a dependency regression fails loud rather than silently skipping. - Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6 prose and the parity mermaid diagram to point at contracts/cedar-parity/ (the precedent set by contracts/memory-hash-vectors.json) instead of a new tests/fixtures/ dir. Regenerate the Starlight mirror. - Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm diagnostics.reason naming asymmetry surfaced by the spikes; engine code normalizes at the boundary. - Fix rev-4 -> rev-5 cosmetic footer drift. Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190 unchanged. No production code changes in this chunk; engine rewrite lands in Chunk 2. Follow-up: separate chore issue to move contracts/memory-hash-vectors.json into a self-named subdir for consistency with contracts/cedar-parity/.
Chunk 2 of the Cedar HITL gates PR. Rewrites agent/src/policy.py into the three-outcome engine specified in docs/design/CEDAR_HITL_GATES.md section 6. The REQUIRE_APPROVAL outcome is the human-in-the-loop surface the next chunks (PreToolUse hook extension, REST API, CLI) plug into. This chunk ships the engine and its load-time validation; no hook or wire-format changes yet. Engine: - Outcome enum (ALLOW, DENY, REQUIRE_APPROVAL) + extended PolicyDecision with .allowed backward-compat shim for Phase 1a/1b/2 callers. Custom __init__ accepts both outcome= and legacy allowed= kwargs so existing tests keep working verbatim. - Three-outcome pipeline per section 6.2: hard-deny eval (absolute) -> allowlist fast-path (tool_type/tool_group/bash_pattern/write_path/ all_session) -> recent-decision cache (60s TTL on DENIED/TIMED_OUT) -> soft-deny eval (with post-eval rule-scope allowlist check and blueprint_disable filtering) -> default ALLOW. - ApprovalAllowlist (section 6.4): parses and matches every scope type. Strips whitespace and rejects empty-after-strip values so "tool_type: Read " normalizes instead of silently mismatching (review finding 6). - RecentDecisionCache (section 12.9): 50-entry LRU, INDEPENDENT of approvalGateCap. Populated only on DENIED/TIMED_OUT. Session-scoped (documented section 12.8 caveat). - Annotation handling (sections 5.2 + 6.3): parses @rule_id, @tier, @approval_timeout_s, @Severity, @category via cedarpy.policies_to_json_str(); merges on multi-match with min timeout (clamped by 30s floor) and max severity. - Load-time validation (sections 5.1, 12.4): rejects missing/mismatched @tier, missing @rule_id, sub-floor timeouts, duplicate rule_ids across tiers, blueprint text > 64 KB, disable entries naming built-in hard-deny rules (finding 9), approval_gate_cap outside [1, 500] (decision 13). Sub-120s @approval_timeout_s emits WARN but accepts (IMPL-25). - Fail-closed posture (section 13): cedarpy parse errors surface via diagnostics.errors -> RuntimeError raised inside _eval_tier -> outer handler returns DENY with reason "fail-closed: <ExceptionType>". TypeError on json.dumps of unhashable tool_input surfaces as distinct "fail-closed: unhashable_tool_input" reason (review finding 5). Built-in policies: - agent/policies/hard_deny.cedar: base_permit catch-all + rm_slash + write_git_internals + write_git_internals_nested + drop_table + pr_review-specific Write/Edit forbids (absolute). - agent/policies/soft_deny.cedar: base_permit (catch-all required in each tier so cedarpy default-deny does not convert no-match into DENY) + force_push_any + force_push_main + push_to_protected_branch + write_env_files + write_credentials. All soft rules carry @tier, @rule_id, @approval_timeout_s, @Severity, @category per section 15.4 starter set. Review findings addressed (1 blocker, 8 significant, plus minor): - blueprint_disable actually disables soft rules at eval time instead of silently no-op (the blocker: test coverage had been a silent-pass). - Legacy extra_policies with @tier/@rule_id rejected to avoid undefined double-annotation behavior. - _matching_rule_ids logs WARN on unknown policy IDs (state-drift signal). - base_permit validator exemption restricted to effect=="permit" so misnamed forbid rules cannot bypass validation (finding 7). - Hard-tier Cedar no_decision logged at WARN (signals missing/malformed base_permit catch-all). - Allowlist whitespace normalization + empty-value rejection. - StrEnum upgrade, Callable moved to TYPE_CHECKING, assert replaced with explicit RuntimeError for S101 compliance. Phase 1 compatibility: - All 39 existing test_policy.py tests pass unchanged via the .allowed property. One test (test_invalid_policy_syntax_fails_closed) updated to patch _hard_policies instead of the removed _policies attribute; docstring explains the rewrite. - extra_policies kwarg preserved; callers with annotated rules must migrate to blueprint_soft_policies / blueprint_hard_policies. Test counts: agent 516 -> 576 (+60: 51 three-outcome + 9 regression fixes). cli 190 unchanged. cdk 1054 unchanged. Carry-forward to Chunk 3: - extra_policies semantic shift (Phase 1 DENY -> Chunk 2 REQUIRE_APPROVAL); .allowed=False preserved but .outcome differs. Switchover happens when hooks.py adopts the three-outcome branching. - Cross-tier action-context asymmetry (review finding 8): document rule-authoring constraint in section 5.5 of design. - Probe entity-shape coverage (finding 10): extend _probe_cedar to exercise Write/Edit/Bash action paths, not just invoke_tool.
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding aws-samples#6/aws-samples#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
…ives
Adds the four agent-side DDB primitives §6.5 + IMPL-24 need for the
three-outcome hook integration in the next commit:
- ``transact_write_approval_request`` — cross-table TransactWriteItems:
Put(TaskApprovalsTable) with ``attribute_not_exists(request_id)`` +
Update(TaskTable) gated on ``status = RUNNING``. Atomic per §12.3 so
a concurrent cancel cannot land the task in AWAITING_APPROVAL with
no matching approval row (or vice versa).
- ``transact_resume_from_approval`` — Update(TaskTable) gated on
``status = AWAITING_APPROVAL AND awaiting_approval_request_id =
:rid``. The ``request_id`` condition prevents resuming with a stale
ID after a reconciler race (§13.9).
- ``best_effort_update_approval_status`` — conditional UpdateItem on
the approval row with ``status = :pending`` guard. Returns False on
``ConditionalCheckFailedException``; this is the signal IMPL-24's
re-read path fires on (§6.5 pseudocode lines 846-879, §13.12).
- ``get_approval_row`` — GetItem with ``ConsistentRead=True`` by
default. Required by IMPL-24's re-read; kept opt-out (bool flag) for
future cold-path callers that don't need the strong read.
Errors:
- ``ApprovalTablesUnavailable`` for env-var-missing — raised loud so
a pre-Chunk-4 deploy fails closed (hook will map to DENY) rather
than silently no-op'ing the gate.
- ``ApprovalWriteError`` / ``ApprovalResumeError`` wrap
``TransactionCanceledException`` with the cancellation reasons
list. The hook uses these to distinguish the "concurrent cancel"
branch from real DDB outages.
- ``ConditionalCheckFailedException`` on ``update_item`` is consumed
and returned as ``False`` from ``best_effort_update_approval_status``
— the caller (hook) needs the boolean to decide whether to
re-read, not to propagate.
- All other DDB errors propagate so the hook's outer try/except can
classify fail-closed with a specific reason.
Implementation notes:
- Uses ``boto3.client("dynamodb")`` low-level API (not resource).
``transact_write_items`` lives on the client, and marshalling the
approval row attributes explicitly gives deterministic DDB shapes
that the tests can assert on. ``_py_to_ddb_attr`` covers the
subset of Python types §10.1 actually uses (str/int/bool/None/list
of str); any other type raises TypeError loudly rather than
silently writing something unexpected.
- ``_extract_error_code`` / ``_extract_cancellation_reasons`` duck-type
on ``exc.response`` so we don't need botocore at import time (tests
use a minimal exception class).
- Errors from unsupported types (floats, dicts, etc.) are caught
BEFORE the DDB round-trip so the unit-test asserts
``transact_write_items`` was not called — catches schema drift
early.
- Status constants (``_STATUS_RUNNING`` / ``_STATUS_AWAITING_APPROVAL``)
named so a rename in CDK cannot silently diverge the Python path.
Tests: +20 total.
- 5 on TransactWriteApprovalRequest: env-missing, happy-path shape
assertion (both items + conditions), TransactionCanceled → ApprovalWriteError
with reasons preserved, other errors propagate, unsupported type rejected
before any DDB call.
- 3 on TransactResumeFromApproval: env-missing, happy-path expression
shape (includes REMOVE awaiting_approval_request_id), cancel →
ApprovalResumeError.
- 4 on BestEffortUpdateApprovalStatus: happy path returns True,
``reason`` kwarg attaches ``deny_reason``, ConditionalCheckFailed
returns False (IMPL-24's signal), other errors propagate.
- 4 on GetApprovalRow: ConsistentRead default True, opt-out False,
row-not-found returns None, row unmarshalling through every
supported DDB attribute type.
- 4 on helpers: error-code extraction with and without
ClientError-shape, cancellation-reasons extraction with and without.
No runtime callers yet — hook integration lands in commit C. Physical
TaskApprovalsTable lands in Chunk 4; Python side is wire-compatible so
the hook work can be unit-tested today with mocked clients.
Wires the agent to the full §6.5 pseudocode: cap + rate-limit check,
atomic TransactWriteItems for pending row + TaskTable AWAITING_APPROVAL,
2s→5s ConsistentRead poll, IMPL-24 VM-throttle race re-read, resume
transition, scope propagation to allowlist, and denial-injection queue
consumed at the next Stop seam. Completes §15.2 rows 26 + 27.
Hook control flow (three outcomes)
----------------------------------
- ALLOW / DENY: existing Phase 1 behavior, now switching on
``.outcome`` rather than ``.allowed``. Legacy Phase 1/2 tests still
green because PolicyDecision preserves the ``.allowed`` shim.
- REQUIRE_APPROVAL (new): extracted into ``_handle_require_approval``
for readability. Delegates to ``task_state`` primitives and
``engine.*`` counter surfaces from the prior commits; no new DDB
client construction here.
Key pieces:
- ``_compute_effective_timeout`` applies the §6.5 min(rule, default,
lifetime) formula. The engine's ``_merge_annotations`` has already
clipped decision.timeout_s against the task default; the hook adds
the remaining-lifetime ceiling and floors at FLOOR_30S.
``clip_reason`` distinguishes ``rule_annotation`` (rule was tighter
than task default) from ``maxLifetime_ceiling`` (task is late in
its life) so ``approval_timeout_capped`` carries the right reason.
- ``_remaining_maxlifetime_s`` reads ``AGENTCORE_MAX_LIFETIME_S`` +
``TASK_STARTED_AT`` env vars (8h default). Returns ``None`` when
the start timestamp is absent — the hook treats that as "unknown,
don't clip" rather than pre-DENYing, so Phase 1 test paths that
don't set the env var still see the old task-default behaviour.
Chunk 4/5 will wire these at task launch.
- ``_poll_for_decision`` uses 2s cadence for the first 30s then 5s
(IMPL-12). All polls use ``ConsistentRead=True`` per IMPL-24. 3
consecutive GetItem failures emit ``approval_poll_degraded``; 10
consecutive failures fall through as TIMED_OUT with a specific
reason (§13.2).
- ``_reconcile_late_decision`` implements IMPL-24 re-read: on a
ConditionCheckFailed from the TIMED_OUT write, re-read with
ConsistentRead. APPROVED → rebuild outcome, propagate scope to
allowlist, run normal allow flow, emit ``approval_late_win``.
DENIED → honor the user's sanitized reason. PENDING or row gone
→ fall through with TIMED_OUT (fail-closed, §13.12 last paragraph).
Cancel-wins semantics (finding aws-samples#2)
----------------------------------
``_denial_between_turns_hook`` is registered AFTER
``_nudge_between_turns_hook`` in ``between_turns_hooks`` so cancel
short-circuits both. The hook re-checks ``_cancel_requested`` itself
as belt-and-braces (matching the nudge hook) so a future reorder does
not silently break cancel-wins. Denial queue is PRESERVED on cancel —
not drained — so a denial still sitting on the queue when the task is
being torn down does not leak across tasks (the engine is per-task
per §IMPL-7).
``stop_hook`` threads ``engine`` into ``ctx`` so the denial hook can
``drain_denial_injections``. ``build_hook_matchers`` accepts a new
``user_id`` kwarg (§12.2) so approval rows carry caller identity for
the REST side's ownership check.
``permissionDecisionReason`` guaranteed surface
-----------------------------------------------
The hook's deny return is the ONLY guaranteed surface the SDK emits
to the agent; denial injection is best-effort (pre-empted by cancel).
``_deny_response`` pipes every reason through ``_strip_ansi`` +
``_truncate(500)``: ANSI sequences can never reach the model, and the
line stays loggable. §12.7 requirement.
Tests: +24 agent hook tests (47 total in test_hooks.py). Run in 0.92s
via a ``_fast_poll`` fixture that collapses ``asyncio.sleep`` to a
no-op AND advances ``hooks.time.monotonic`` by the requested duration
so the poll wall-clock deadline actually trips.
Happy paths:
- APPROVED + scope propagation to allowlist + milestones.
- APPROVED with scope=this_call does NOT grow allowlist.
- DENIED queues denial injection + populates recent-decision cache
(next identical call auto-denies).
- TIMED_OUT writes TIMED_OUT row and emits approval_timed_out.
IMPL-24 race: four branches.
- APPROVED re-read → allow flow, approval_late_win milestone, scope
propagated, resume succeeds.
- DENIED re-read → deny flow, approval_late_win milestone, user's
reason is the permissionDecisionReason.
- Still-PENDING re-read → fail-closed fall-through (no late_win).
- Row-gone re-read → same fail-closed fall-through.
Cap / rate-limit / write failure / resume failure branches all:
- Short-circuit before any DDB write when the local guard fires
(cap, rate limit).
- Emit the right approval_* milestone.
- Return DENY with a specific permissionDecisionReason.
Sanitization:
- ANSI stripped from deny reason.
- Deny reason truncated to ≤500 chars.
Timeout clipping:
- rule_annotation reason when a rule's approval_timeout_s is below
the task default; matching_rule_ids populated.
- maxLifetime_ceiling reason when remaining lifetime is the tightest
bound; matching_rule_ids is None.
- approval_ceiling_shrinking emits exactly once per task (IMPL-26
latch).
Denial injection hook (6 tests):
- Draining produces a <user_denial request_id=... decided_at=...>
block with XML-escaped reason.
- Cancel short-circuit preserves the queue so the denial is not
lost; just not injected into a dying agent.
- Hostile reason (</user_denial>...<user_nudge>) is XML-escaped so
the envelope cannot be forged.
- No-engine ctx returns [] (Phase 1 call sites still work).
- Registered LAST in ``between_turns_hooks`` (invariant for §6.5
finding aws-samples#2).
- End-to-end via stop_hook: queued denial becomes
``decision=block`` + reason on the Stop return.
Carry-forward
-------------
- ``_remaining_maxlifetime_s`` returns None when TASK_STARTED_AT is
unset — Chunk 4/5 will wire this at task launch. Tracked in §16.
- ``approval_gate_count`` lives on the engine (session-scoped) not on
TaskTable in v1. §13.6 notes that the reconciler + approval_gate_cap
still bound worst-case across container restarts. Chunk 7+ tracks
persistence when telemetry justifies it.
- Denial injection emits a ``user_denial_injected`` milestone that is
NOT in the §11.1 enumerated table. It mirrors ``nudge_acknowledged``
for stream visibility; keep the name distinct from the ``approval_*``
prefix so future §11.1 consumers can't confuse it with an approval
outcome.
Lands the stateless CDK primitives for Cedar-HITL approval gates so
Chunk 5's REST handlers can be wired onto concrete tables. Completes
§15.2 tasks 9, 20, and 25.
Constructs
----------
``TaskApprovalsTable`` (§10.1)
- PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side
primitives landed in the prior commit.
- GSI ``user_id-status-index`` with user_id PK + status SK and an
``INCLUDE`` projection limited to the fields GET /v1/pending
renders. Three deny-sensitive attrs (``deny_reason``, ``scope``,
``tool_input_sha256``) deliberately omitted from the projection —
the list endpoint only returns PENDING rows in practice, but
excluding them kills the projection-leak concern outright and
costs no bytes today.
- Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors
it on ``construct.userStatusIndexName`` so handlers referencing
the GSI fail compile-time on a rename.
- TTL attribute ``ttl`` (agent writes ``created_at + timeout_s +
120s``).
- No DynamoDB streams per §11.2. TaskEventsTable carries the audit
fan-out; streams here would duplicate.
- Default RemovalPolicy.DESTROY to match the rest of the sample.
Production deploys override to RETAIN per §10.1.
``SlackUserMappingTable`` (§11.2, finding aws-samples#4)
- Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no
stream. The forward-only shape is the trust boundary — a reverse
GSI (Cognito → Slack) would let a compromised Cognito sub
enumerate Slack identities without adding v1 capability.
- Writes land through LinkSlackUserFn (Chunk 5) which enforces the
``attribute_not_exists(slack_user_id)`` condition so a prior
legitimate mapping cannot be overwritten by a later compromise.
``task-status.ts`` — AWAITING_APPROVAL (§10.3)
- Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES:
the task is alive, paused on a human decision).
- VALID_TRANSITIONS wires the five edges §10.3 enumerates:
RUNNING → AWAITING_APPROVAL (soft-deny entry)
HYDRATING → AWAITING_APPROVAL (rare early-gate case)
AWAITING_APPROVAL → RUNNING (approve / deny resume)
AWAITING_APPROVAL → CANCELLED (user cancel mid-approval)
AWAITING_APPROVAL → FAILED (stranded-approval reconciler)
- Notably NOT added:
AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race)
AWAITING_APPROVAL → COMPLETED (skip RUNNING)
AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval
row, not the task clock)
These are regression tests so a future refactor cannot quietly
add them and bypass the `awaiting_approval_request_id = :rid`
invariant.
Tests: +29 total.
- TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST,
PITR default + override, TTL attribute, NO streams, GSI schema +
projection + sensitive-attr exclusion, removal policy default +
override, ``USER_STATUS_INDEX_NAME`` constant parity with the
construct field.
- SlackUserMappingTable (8 tests): single-key schema (explicit
KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams,
no reverse GSI, DESTROY default, TTL absent.
- TaskStatus (+10 tests over existing: 5 new assertions on the
9-state cardinality, AWAITING_APPROVAL membership, and the
transition graph including the three forbidden edges). The
existing assertions updated for the new state count.
No stack wiring yet — ``agent.ts`` instantiation + env var plumbing +
grants land in the next commit alongside the Cedar-WASM Lambda layer.
…stack
Activates the agent-side approval path and ships the Lambda layer
Chunk 5's REST handlers need.
Cedar-wasm Lambda layer (§15.2 task 10)
----------------------------------------
``CedarWasmLayer`` bundles ``@cedar-policy/cedar-wasm@4.10.0`` into
``/opt/nodejs/node_modules/`` so Lambdas can
``require('@cedar-policy/cedar-wasm/nodejs')`` without shipping the
4 MB wasm binary in every function package. A dedicated
``cdk/layers/cedar-wasm/`` directory carries a minimal ``package.json``
pinning the exact version — bundling runs ``npm install --omit=dev``
against that manifest, so the layer build is hermetic from any
``cdk/node_modules/`` drift.
The bundler has two fallbacks:
- Docker (``public.ecr.aws/sam/build-nodejs22.x``) for CI / prod
deploys.
- Local-npm fallback for environments without Docker (unit-test
synths + `cdk synth` on runners that lack Docker). The local
path is safe here because the layer ships pure JS + a prebuilt
wasm binary — no native build step.
Three constants exposed from the module:
- ``CEDAR_WASM_VERSION`` — single source of truth for the pinned
version; tests assert this matches both ``cdk/package.json`` and
the layer manifest, so the three places the version lives stay
in sync.
- ``CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` — 512 MB floor for attaching
Lambdas per §15.2 task 10.
- ``CedarWasmLayer.layer`` — the underlying ``LayerVersion`` for
Chunk 5 handlers to attach via ``fn.addLayers(...)``.
Agent stack wiring (§15.2 task 19)
------------------------------------
``agent.ts`` now instantiates:
- ``TaskApprovalsTable`` (prior commit) — grants RW to the runtime
so ``pre_tool_use_hook`` can TransactWriteItems + ConsistentRead
the PENDING row.
- ``SlackUserMappingTable`` (prior commit) — not granted to the
runtime; only the link-user Lambda (Chunk 5) writes here.
- ``CedarWasmLayer`` — the layer's asset lands in the synthed
template so Chunk 5 handlers can reference ``.layer`` without
causing a new asset on their deploy.
New runtime env vars:
- ``TASK_APPROVALS_TABLE_NAME`` — consumed by
``task_state._require_tables``; its absence previously raised
``ApprovalTablesUnavailable`` → hook DENY. Now set, so the
approval path is live on deploy.
- ``AGENTCORE_MAX_LIFETIME_S = '28800'`` — 8 hours, matching
``lifecycleConfiguration.maxLifetime``. Consumed by the hook's
``_remaining_maxlifetime_s`` for the maxLifetime ceiling clip
(§6.5). Kept in sync with the lifecycle via a direct test
assertion so drift surfaces at build time.
New CfnOutputs: ``TaskApprovalsTableName``, ``SlackUserMappingTableName``,
``CedarWasmLayerArn``. Each is useful for post-deploy smoke tests
(`aws dynamodb describe-table` / `aws lambda get-layer-version`).
Tests: +8 layer tests + 9 agent-stack assertions.
Layer:
- LayerVersion resource count.
- Compatible runtimes (nodejs20/22).
- Description carries the pinned version.
- CEDAR_WASM_VERSION matches ``cdk/package.json``.
- CEDAR_WASM_VERSION matches ``layers/cedar-wasm/package.json``.
- CEDAR_WASM_MIN_LAMBDA_MEMORY_MB ≥ 512.
- Custom description override works.
- ``.layer`` exposes a real ``LayerVersion``.
Agent stack:
- Table count updated from 6 → 8.
- TaskApprovalsTable schema match (task_id PK / request_id SK,
user_id-status-index GSI presence).
- SlackUserMappingTable single-key schema.
- LayerVersion count + compatibleRuntimes.
- Three new CfnOutputs present.
- TASK_APPROVALS_TABLE_NAME env var on the runtime.
- AGENTCORE_MAX_LIFETIME_S == '28800' (drift guard).
Carry-forward
-------------
- ``TASK_STARTED_AT`` is the other input the hook's
``_remaining_maxlifetime_s`` consumes — it's a PER-TASK value the
orchestrator must stamp at invocation time, not a stack-level env
var. Chunk 5's orchestrator changes need to add it to the runtime
invocation payload / session env. For now the hook's fallback
("unknown, don't clip") keeps approvals functional.
- Chunk 5 will attach the CedarWasmLayer onto ApproveTaskFn,
DenyTaskFn, GetPoliciesFn, CreateTaskFn and assert
``memorySize >= CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` for each.
Lands the two user-facing REST handlers that flip a PENDING approval
row to APPROVED / DENIED, the shared types both call sites and the
CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers
(get-policies, create-task validation) will use.
Wire types (shared/types.ts)
----------------------------
- ApprovalScope union covering every shape the agent's
ApprovalAllowlist understands. Typed so approve-task / create-task /
CLI (Chunk 6) all agree at compile time.
- ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest /
DenyResponse / PendingApprovalSummary / GetPendingResponse /
PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest /
LinkSlackUserResponse / SlackUserMappingRecord /
ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions.
- Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20,
INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30,
APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300.
New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED,
TASK_NOT_AWAITING_APPROVAL.
Shared helpers
--------------
- shared/approval-scope.ts — parseApprovalScope validates every shape;
rejects unknown tool types / groups / prefixes, empty values,
over-128-char strings. isDegeneratePattern implements §7.4 (length
≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task.
- shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys,
GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks,
bearer tokens with [REDACTED-...] markers. Mirrors
agent/src/output_scanner.py so the deny reason the agent
ultimately reads is never raw user input.
- shared/cedar-policy.ts — parseRules pulls the five HITL annotations
(tier/rule_id/severity/approval_timeout_s/category) into a
ParsedRule[], preserving positional policy_id for IMPL-29
diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId,
matchingRuleIds, concatPolicies exposed for future handlers.
Handlers
--------
- approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve
- Cross-table TransactWriteItems: approval row PENDING → APPROVED
guarded by user_id = :caller AND status = :pending; TaskTable
no-op Update guarded by status = AWAITING_APPROVAL AND
awaiting_approval_request_id = :rid.
- TransactionCanceledException classified by per-item
CancellationReasons. Approval-row failure collapses to 404
REQUEST_NOT_FOUND (no existence oracle per §7.1 finding aws-samples#6);
task-row failure → 409 TASK_NOT_AWAITING_APPROVAL.
- Optional scope defaults to this_call.
- Per-user per-minute rate limit (30/min, synthetic row).
- Writes approval_decision_recorded audit event (IMPL-6). Audit
failure is logged but does not fail the request — decision is
already committed.
- deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny
- Same cross-table pattern; status → DENIED + deny_reason.
- Reason is scanDenyReason-sanitized + truncated to
DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit
both read sanitized form; raw input never stored.
- Same rate-limit namespace as approve.
Tests: +64 total (cedar-policy-parser 24, approval-scope 28,
deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test
fixtures are assembled from string fragments so the source never
holds a contiguous secret literal — Code Defender pre-commit hook
otherwise blocks.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout +
LinkSlackUserFn + GetPolicies + GetPending) lands in the next
commit.
Lands the three read/discovery handlers Chunk 6 (CLI) needs to power
``bgagent pending``, ``bgagent policies list/show``, and
``bgagent notifications configure slack``. Completes §15.2 tasks
14, 15, and 25 (handler side).
Handlers
--------
``get-pending.ts`` (§7.7 — GET /v1/pending)
- Queries ``user_id-status-index`` GSI on TaskApprovalsTable with
``user_id = :caller AND status = :pending``. Without the GSI
this would be a full-table Scan per call — under
``watch -n1 bgagent pending`` that exhausts burst capacity for
the whole fleet (§10.1 finding aws-samples#8).
- Response maps each row to ``PendingApprovalSummary`` with a
derived ``expires_at = created_at + timeout_s`` so the CLI can
render time-to-timeout without doing arithmetic on ISO strings.
- Severity coerced to ``medium`` on unknown values so GSI writes
that drift from the enum don't break the list response.
- Rate-limited 10/min/user (synthetic row on the same table,
namespaced ``RATE#<user>#PENDING`` so it does not collide with
the approve/deny counter).
``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies)
- Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES``
with the repo's ``cedar_policies`` blueprint override. Runs the
combined text through ``parseRules`` and returns
``{hard[], soft[]}`` rule summaries.
- 5-minute per-repo in-Lambda cache; cold starts throw it away.
``_resetCacheForTests`` exposed for unit-test isolation.
- Repo ID is URL-decoded from the path (``owner%2Frepo`` common in
CLI UX).
- Rate-limited 30/min/user.
- Blueprint load failure falls back to built-ins with a WARN log;
invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE``
rather than a misleading empty list.
``link-slack-user.ts`` (§11.2 finding aws-samples#4 — POST /v1/notifications/slack/link)
- Writes to SlackUserMappingTable with
``ConditionExpression: attribute_not_exists(slack_user_id)``. This
guard is the entire admission control the §11.2 design hinges on:
even a compromised Slack admin cannot overwrite an existing
mapping.
- Validates ``slack_user_id`` shape (letters, digits, underscores,
2–40 chars) so junk rows cannot land.
- Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused
error code (the payload message directs the user to unlink via
support).
- Slack link_token end-to-end validation against Slack OAuth is
deferred — v1 accepts the token on trust from the Cognito-authed
caller; it is persisted in CloudWatch for audit.
Supporting primitives
---------------------
``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar``
and ``agent/policies/soft_deny.cedar`` as TypeScript string constants.
Embedded rather than read from disk because Lambda's esbuild bundler
does not copy non-TS assets by default and a dedicated bundling hook
is more code than the embed. A drift test
(``builtin-policies.test.ts``) asserts byte-equality with the agent
files so any change on one side without the other flips red at build
time.
``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated
``base_permit`` entry (both tiers need it as a Cedar catch-all; it
is not a user-facing rule so it stays out of ParsedRule[]). This
matches the agent-side ``_parse_policy_annotations`` behaviour.
Tests: +37 total.
- get-pending (8): 401 on missing auth, 429 on rate limit, empty
result, GSI query shape, row → PendingApprovalSummary with
derived expires_at, severity fallback, missing timeout → expires_at
falls back to created_at, 500 on DDB error.
- get-policies (11): 401/400 validation, built-in rules listed on
empty repo, URL-decoded repo path, custom blueprint rule lands
in soft, per-repo cache across calls, 429 rate limit, 503 on
invalid blueprint cedar, fallback on load failure, hard rules
omit severity / approval_timeout_s, soft rules carry them.
- link-slack-user (8): 401/400 validation, shape check, 201 on
success, 409 on overwrite attempt, 500 on unknown DDB error,
whitespace trim on slack_user_id, ConditionExpression verified.
- builtin-policies (4): drift byte-equality with both agent files,
parseRules round-trip for hard/soft rule IDs.
- cedar-policy (updated): ``base_permit`` is skipped from
ParsedRule[] rather than rejected.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout) lands in
the next commit.
…gent plumbing
Completes Chunk 5 end-to-end: the five new Lambdas are instantiated
and wired onto the REST API, the orchestrator threads approval-related
data through to the agent runtime, the stranded-task reconciler sweeps
AWAITING_APPROVAL tasks, and the agent pipeline accepts the new
per-task approval configuration.
Stack wiring (agent.ts + task-api.ts)
-------------------------------------
- TaskApi construct accepts `taskApprovalsTable`, `slackUserMappingTable`,
`cedarWasmLayer` props. Approve/Deny/GetPending Lambdas are created
when the approvals table is present; GetPolicies also requires the
cedar-wasm layer + RepoTable. Slack-link Lambda attaches when the
slack mapping table is provided.
- New routes:
POST /tasks/{task_id}/approve
POST /tasks/{task_id}/deny
GET /pending
GET /repos/{repo_id}/policies
POST /notifications/slack/link
- GetPoliciesFn configures `memorySize: 512` (Cedar-wasm floor from
§15.2 task 10) and externalizes `@cedar-policy/cedar-wasm` from the
esbuild bundle so the layer provides the wasm binary at runtime.
- CedarWasmLayer compatibleRuntimes extended to include nodejs24.x
(the Lambda runtime) — the Node 20/22 list was the original §15.2
spec but the actual function uses Node 24.
- agent.ts passes all three new constructs into TaskApi.
Orchestrator (shared/orchestrator.ts)
-------------------------------------
- `finalizeTask` now treats AWAITING_APPROVAL as a "task still alive"
terminal-timeout source: on poll exhaustion the task transitions to
TIMED_OUT with a distinct `approval_poll_timeout` reason + error
message ("Orchestrator poll timeout exceeded while awaiting approval").
The stranded-approval reconciler is the secondary safety net (§13.6)
for tasks the orchestrator already lost track of.
- Invocation payload now carries three new fields:
- `task_started_at` (ISO 8601 at HYDRATING → RUNNING time) —
consumed by the agent hook's `_remaining_maxlifetime_s` so the
§6.5 maxLifetime ceiling math uses the real task clock instead
of the fail-open fallback.
- `approval_timeout_s` (when the submit payload supplied it).
- `initial_approvals` (when the submit payload supplied entries).
Stranded-task reconciler
------------------------
- Sweeps AWAITING_APPROVAL in addition to SUBMITTED/HYDRATING.
- New `APPROVAL_STRANDED_TIMEOUT_SECONDS` env var (default 7200s =
2h) — double §7.3's 1h ceiling so this reconciler never races the
happy-path timer.
- Distinct failure message on approval-stranded vs generic-stranded
so users see "approval stranded — container evicted" rather than
the misleading "no pipeline attached" copy.
Fanout (handlers/fanout-task-events.ts)
---------------------------------------
- Slack channel default set replaces the forward-compat
`approval_required` stub with the real §11.1 events:
`approval_requested` and `approval_stranded`. Other approval
milestones (granted/denied/timed_out/late_win/etc.) stay out of
default routing to avoid notification fatigue — the CLI surfaces
those confirmations directly.
- Email default replaces `approval_required` with `approval_requested`
(high-severity gates only; severity gating happens in the dispatcher).
Create-task validation (shared/create-task-core.ts)
---------------------------------------------------
- New request fields:
- `approval_timeout_s` — integer within
`[APPROVAL_TIMEOUT_S_MIN, APPROVAL_TIMEOUT_S_MAX]`.
- `initial_approvals` — array of scope strings; each entry must
be a valid `ApprovalScope` per `parseApprovalScope`; bash_pattern
and write_path scopes get the §7.4 degenerate-pattern check.
- TaskRecord extended with `approval_timeout_s`, `initial_approvals`,
`approval_gate_count` (seeded to 0 at admission), and
`awaiting_approval_request_id` (written atomically by the agent's
`transact_write_approval_request` primitive).
Agent plumbing (models.py / config.py / pipeline.py / runner.py / server.py)
----------------------------------------------------------------------------
- `TaskConfig` adds `approval_timeout_s`, `initial_approvals`.
- `build_config`, `run_task`, `_run_task_background`, and
`_extract_invocation_params` thread the two new fields from payload
→ config → PolicyEngine.
- `server._extract_invocation_params` stamps `os.environ["TASK_STARTED_AT"]`
from the payload so the hook's `_remaining_maxlifetime_s` returns
real values (carry-forward from Chunk 3 resolved).
- `runner.py` constructs PolicyEngine with `initial_approvals` +
`task_default_timeout_s` when supplied; the engine clamps bad
values at construction time.
Tests
-----
All CDK tests pass: 1219 / 1219.
All agent tests pass: 648 / 648.
Affected suites (changes only):
- test/stacks/agent.test.ts: cedar-wasm layer CompatibleRuntimes
now expects `nodejs24.x`; table count still 8.
- test/constructs/cedar-wasm-layer.test.ts: same runtime expansion.
- test/handlers/fanout-task-events.test.ts: approval_required →
approval_requested/approval_stranded in Slack default set;
approval_required → approval_requested in Email default set.
- test/handlers/reconcile-stranded-tasks.test.ts: primeResponses
now queue a third `Items: []` for AWAITING_APPROVAL queries;
queryCalls assertion bumped to 3.
Carry-forward (non-blocking)
----------------------------
- GetPoliciesFn has write access to TaskApprovalsTable (for the
rate-limit counter path). A future permissions audit should
tighten this to a single-item write scoped to `RATE#<user>#*`.
- TASK_STARTED_AT env var is only set when a payload supplies it;
server.py still supports the Phase 2 no-payload startup path.
Ships the four user-facing commands that close the Cedar HITL loop:
once Chunks 1-5 have a PENDING approval row and the Slack/Email fan-out
has notified the user, Chunk 6 is how they actually respond.
New commands (cli/src/commands/)
--------------------------------
- `bgagent approve <task-id> <request-id> [--scope <scope>] [--yes]`
Default scope is `this_call`; callers extend allowlist with
`tool_type:Bash`, `rule:<id>`, etc. `all_session` is the only scope
that requires `--yes` to confirm — mirrors the safety UX from
§8.4. Error classification maps 404 → "run `bgagent pending`", 409
→ "task no longer awaiting approval", 429 → rate-limit, 401 → login.
- `bgagent deny <task-id> <request-id> [--reason ... | --reason-file ...]`
`--reason-file` accepts multi-line reasons that would otherwise
need shell quoting. Client-side `DENY_REASON_MAX_LENGTH` cap avoids
a round-trip on obviously-too-long reasons; the server still
truncates. Reason is sanitized server-side (output_scanner) before
ever reaching the agent.
- `bgagent pending [--output text|json]`
Lists every PENDING approval owned by the caller. Rendered with
approve/deny hints inline so the user can copy-paste the next
command. JSON output for scripting. Rate-limited server-side.
- `bgagent policies list --repo <owner/repo> [--tier hard|soft]`
`bgagent policies show --repo <owner/repo> --rule <rule_id>`
Discovery commands so users can find rule IDs without reading CDK
source. Both subcommands reuse a single `listPolicies` API call
and filter locally.
Wire changes
------------
- `cli/src/api-client.ts`: `approveTask`, `denyTask`, `listPending`,
`listPolicies` — each matching the §7.1 / §7.2 / §7.6 / §7.7
request/response shapes. `approveTask` omits the `scope` body field
when unset so the server's `this_call` default applies.
- `cli/src/types.ts`: mirrors the Chunk 5 server types verbatim —
`ApprovalScope` union, `ApprovalRequest/Response`, `DenyRequest/Response`,
`PendingApprovalSummary`, `GetPoliciesResponse`, `PolicyRuleSummary`,
plus the five constants (`DENY_REASON_MAX_LENGTH`,
`INITIAL_APPROVALS_MAX_ENTRIES`, `INITIAL_APPROVALS_MAX_ENTRY_LENGTH`,
`APPROVAL_TIMEOUT_S_MIN/MAX/DEFAULT`).
- `cli/src/bin/bgagent.ts`: registers the four new commands in the
order they appear in help output.
Tests: +27 new (217 total).
- approve (9): default scope, custom scope, all_session guard +
`--yes` bypass, JSON output, 404/409/401/429 error classifications.
- deny (6): no-reason path, `--reason`, `--reason-file` with
tmpdir fixture, mutually-exclusive rejection, over-length rejection,
404 classification.
- pending (5): empty render, populated render with approve/deny
hints, JSON output, 401 and 429 classifications.
- policies (7): list both tiers, `--tier` filter, `--output json`,
bad `--tier`, show found rule, show unknown rule, 404
repo-not-onboarded classification.
Carry-forward
-------------
- `submit.ts` extension with `--approval-timeout` / `--pre-approve`
flags is deferred to a follow-up commit — the server already accepts
these fields on POST /v1/tasks (Chunk 5), and `bgagent submit`
already forwards unknown payload fields through the existing
request path, so users can set them via `--body-file` today until
the explicit flags land.
- `--output json` on error branches currently returns a CliError
instead of a JSON error envelope; matches the pattern the existing
commands use (status, cancel, nudge). Follow-up to standardize
JSON error envelopes across the whole CLI if that becomes a
common scripting pain point.
…ervability Persist approval_gate_count to TaskTable across container restarts per §13.6 so the cumulative gate budget survives eviction. Emit pre_approvals_loaded after PolicyEngine init per §4 step 7 / §11.1 so operators see the starting approval posture in the live SSE stream. Add IMPL-23 cache-hit observability: cache hits attach metadata to PolicyDecision, hook forwards to new write_policy_decision_cached progress helper (decision_source="recent_decision_cache"). Why: container restarts were silently resetting the per-task gate counter, re-exposing users to another approvalGateCap-worth of gates per restart. Cache-driven denies were invisible in TaskEventsTable beyond the initial gate. Fresh tasks emitted no "starting posture" signal so dashboards could not distinguish "no pre-approvals seeded" from "agent has not started". Surface additions: - task_state.increment_approval_gate_count_in_ddb — best-effort atomic ADD on approval_gate_count - PolicyEngine(initial_approval_gate_count=N) — seed session counter - TaskConfig.initial_approval_gate_count — orchestrator payload field - progress_writer.write_policy_decision_cached — IMPL-23 emitter - PolicyDecision.cache_hit_metadata — observability-only field - _CachedDecision.original_decision_ts — wall-clock preservation - runner._initialize_policy_engine_and_hooks — extracted helper Counter survival is a safety bound, not correctness: DDB failure does NOT block the gate (§13.6). Joint-update invariant on status + awaiting_approval_request_id (§10.2) is preserved — counter uses separate UpdateItem, not merged into resume transaction. Tests: +36 agent (648→684), +8 CDK (1219→1227), +6 new runner tests.
Capture the per-task approval-gate cap at submit-time (§4 step 5, decision aws-samples#13, §13.6) so a blueprint-configured override is frozen onto the TaskRecord. Mid-task blueprint edits cannot shift the cap beneath a running task; container restarts re-seed the agent's PolicyEngine from the persisted value instead of its compile-time default-50. Why: Chunk 7a added approval_gate_count persistence but the cap itself was still resolved from the blueprint on every restart — so an operator lowering security.approvalGateCap mid-task would retroactively fail-close the running task. The design has always said cap is frozen at submit; this chunk makes the implementation match. Surface additions: - BlueprintProps.security.approvalGateCap (CDK, synth-validated [1, 500] integer) — new per-repo blueprint prop - RepoConfig.approval_gate_cap + BlueprintConfig.approval_gate_cap - TaskRecord.approval_gate_cap + APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT} - create-task-core now calls loadRepoConfig, resolves cap, bounds- checks, persists; returns 503 SERVICE_UNAVAILABLE on invalid blueprint data (permanent until admin re-deploys, not transient) - orchestrator.ts: isValidApprovalGateCap integer+bounds guard; logs warn if a persisted cap is structurally invalid (schema drift / hand-edited DDB row) - TaskConfig.approval_gate_cap: int | None = None (agent-side); runner threads to PolicyEngine kwarg when not None - "Task created" log line now carries approval_gate_cap + approval_gate_cap_source ("blueprint" | "platform_default") so operators can detect a broken-plumbing deploy at the single chokepoint where all fallback layers converge Per silent-failure review: - HIGH: 500 → 503 + logger.error for permanent misconfig - HIGH: cap + source in task-created log (catches 4-layer cascade) - MEDIUM: orchestrator guard tightened past typeof (NaN, Infinity, floats, out-of-bounds all omitted + warned) Tests: CDK 1263/1263 (+36), agent 694/694 (+10). CLI unchanged.
… warn path Close three deferred items from Chunks 7a/7b before Chunks 8-10: - runner.py init log now carries approval_gate_cap=N + approval_gate_cap_source=threaded|engine_default. Matches the handler log key so CloudWatch Insights can join across the cascade; agent can't distinguish blueprint-override from platform-default-frozen (handler log is the ground truth). - server.py adds _warn_cw helper routing [server/warn] lines to a dedicated CloudWatch stream (server_warn/<task_id>). stdout print is preserved for local dev + existing capsys tests. AgentCore does not forward container stdout to APPLICATION_LOGS, so pre-7c warnings about malformed invocation payloads were invisible in production. Failure counter shared with _debug_cw for a single alarm surface; hoisted above writer defs for import-time ordering safety. - blueprint.ts emits a synth-time info annotation when security.approvalGateCap is omitted so operators see a signal that the repo will rely on the platform default of 50. Without this, the default was a silent fallback at the handler layer — only visible by inspecting a TaskRecord at runtime. Tests: agent 694→700 (+6), cdk 1263→1265 (+2), cli unchanged. Design refs: §4 step 5, §11.1, §13.6, decision aws-samples#13.
Add created_at / effective_timeout_s / matching_rule_ids to approval_granted / approval_denied / approval_timed_out events so the incoming ApprovalMetricsPublisher Lambda (Chunk 8b) can compute decision latency and emit a rule_id-dimensioned timeout breakdown without a round-trip GetItem against TaskEventsTable. Fields are added conditionally — omitted from metadata when the caller did not supply them — so the event stream stays free of null-value noise and legacy callers continue to produce valid payloads. Publisher handles missing fields via explicit skip-and-log on the specific metric branch (not fallback-to-zero). Agent tests extended: +6 progress_writer tests, +3 hooks tests. Baseline 700 → 710. No consumer wired yet — this commit is a forward-compatible superset; Chunk 8b ships the CDK publisher + dashboard widgets.
…atch dashboard widgets Ship the Cedar-HITL dashboard widgets from §11.3 / IMPL-28 via the MetricsPublisher architecture (Option E): - New ApprovalMetricsPublisher Lambda consumes TaskEventsTable DDB stream as consumer aws-samples#2 (FanoutConsumer is aws-samples#1; stream is within its 2-consumer soft cap — documented in task-events-table.ts). - Handler emits CloudWatch EMF for 3 metrics in namespace ABCA/Cedar-HITL: * ApprovalRequestCount + ClippedApprovalCount (reason dim) → ApprovalTimeoutClipRate widget (MathExpression with IF-guard against NaN on zero-denominator periods) * TimedOutEffectiveTimeout (rule_id dim with allowlist cardinality cap) → ApprovalTimeoutBreakdown widget * ApprovalDecisionLatencyMs (outcome dim) → ApprovalDecisionLatency widget with per-outcome p50/p90/p99 - Observability-of-observability (silent-failure review): * MetricsPublisherHeartbeat per batch so dashboard gaps distinguish "no traffic" from "pipeline broken" * MetricEmitSkipped with a reason dim on schema mismatches, parse anomalies, unknown rule ids — never fall back to latency=0 or count=0 which would poison percentile widgets * Expected high-volume skip reasons (non-milestone events, REMOVE records) DO NOT emit MetricEmitSkipped — only anomaly reasons (missing keys, missing milestone name) do, so real signal isn't drowned * Structured log lines alongside every skip so the absence of metrics is also observable via CloudWatch Logs Insights - Cardinality caps via ``RULE_ID_ALLOWLIST`` + ``normalizeClipReason``. Unknown values collapse to ``other`` / ``unknown`` buckets with dashboard series so the collapse is discoverable rather than silently accruing custom-metric cost. - Event-source-mapping filter pattern rejects non-agent_milestone records at the service layer; handler-layer allowlist catches anything that slips through. Filter pattern correctness tested structurally + positively/negatively probed (silent-failure H3). - Per-record try/catch + reportBatchItemFailures + SQS DLQ mirror the fanout-task-events.ts poison-pill pattern exactly. Deferred to Chunk 10 chore issues: - DLQ alarms (fanout + publisher) — fire-into-void until notification channel lands, so wire with §11.5 alarms as a group - Explicit log-group declaration (IAM drift defense) - stdout-flush race documentation (pre-existing pattern in fanout) - EMF 100-updates/sec throttle alarm Tests: cdk 1265 → 1327 (+62); agent 710 (unchanged); cli 217 (unchanged). All pass. §11.5 alarm plumbing now unblocked — publisher provides the metrics infrastructure the design always intended; only the notification-channel SNS wiring is left.
Bring CEDAR_HITL_GATES.md current with the code that shipped in Chunks 7b (approval_gate_cap persist), 8a (outcome event schema superset), and 8b (ApprovalMetricsPublisher + dashboard widgets): - §10.2 adds the missing approval_gate_cap row (carry-forward drift from Chunk 7b). Bounds + frozen-at-submit semantics documented. - §11.1 outcome events (approval_granted / approval_denied / approval_timed_out) now document the Chunk 8a optional fields (created_at, effective_timeout_s, matching_rule_ids) plus the publisher's skip-on-missing-field policy. - §11.1 intro names ApprovalMetricsPublisherFn as consumer aws-samples#2 and points to §11.3 for the metric schema. - §11.3 rewritten to describe the Option E architecture: publisher Lambda + EMF + native CloudWatch metrics in namespace ABCA/Cedar-HITL, MathExpression with divide-by-zero guard, rule_id cardinality cap, observability-of-observability via heartbeat + skip meta-metrics, widget layout (12/12 over 24), 2-consumer stream budget. Dropped the stale "Retired the old bundled widget" line — that widget never shipped. - §11.5 reframed as "deferred (notification-channel gated)" with a plumbing-status paragraph noting the metric infra now exists; only SNS wiring remains. Alarm list expanded to include DLQ and publisher-health alarms. - §16 IMPL-28 rewritten for Option E; §15.2 row 46 expanded to reference the 4 new test files; Appendix B checklist updated. Starlight mirror regenerated via ``cd docs && node scripts/sync-starlight.mjs``. No code changes. Test baselines unchanged. Adversarial comment-analyzer review verified every new claim against committed code — zero inaccuracies.
…2 mediums
Full-branch adversarial review (code-reviewer + silent-failure-hunter
on all 18 commits) surfaced findings that only appear at final-state.
Addressing the blockers + low-cost meds before deploy:
B2 — stranded approvals were invisible to the dashboard:
- Reconciler writes ``event_type: 'task_stranded'``; the metrics
publisher's event-source filter only accepts
``event_type: 'agent_milestone'``, so AWAITING_APPROVAL evictions
produced zero §11.3 signal.
- Fix: reconciler now additionally emits an ``agent_milestone``
with ``milestone: 'approval_stranded'`` when the stranded task
was AWAITING_APPROVAL. Publisher allowlist extended; classifier
emits ``ApprovalStrandedCount`` counter. SUBMITTED / HYDRATING
stranded events unchanged (guarded by test).
B1 — heartbeat comment was false reassurance:
- Event-source filter blocks Lambda invocation when no
``agent_milestone`` records exist in the poll window, so a
quiet period produces the same widget gap as a broken
pipeline. The code + design-doc wording claimed "gap =
pipeline broken" which would mislead the on-call.
- Fix: corrected module + function docstrings to describe the
heartbeat as "present when active, not pipeline-alive-always."
Operators should alarm on the combination
(heartbeat-absent + recent TaskEventsTable traffic) or wire
a scheduled canary — the latter tracked as a §11.5 follow-up.
M1 — safety-critical milestones produced zero dashboard signal:
- ``approval_cap_exceeded`` (§12.9 per-task cap) and
``approval_rate_limit_exceeded`` (per-user per-minute rate)
were emitted by the agent but not on the publisher allowlist.
A production bug where every gate hit the cap would have
been invisible.
- Fix: both added to APPROVAL_METRIC_MILESTONES with
``ApprovalCapExceededCount`` / ``ApprovalRateLimitExceededCount``
counters. No dimensions — the request_id in the event carries
per-user correlation for ad-hoc log-insights investigation.
H2 — filter / handler eventName disagreement:
- Event-source filter required ``INSERT``; handler accepted
``INSERT`` and ``MODIFY``. Benign today (TaskEventsTable is
put-only), but a future chunk MODIFY-ing records would be
silently dropped by the filter while the handler was ready
to process them.
- Fix: handler now INSERT-only, matching the filter. Single
source of truth on the eventName invariant.
M1-rename — ``expected_non_approval_milestone`` skip reason was
misleading (the non-metric approval milestones like
``approval_late_win`` also land in this bucket). Renamed to
``expected_milestone_not_tracked``.
Tests: cdk 1327 → 1332 (+5: 3 classifier branches for new metrics,
1 reconciler AWAITING_APPROVAL path, 1 SUBMITTED-not-double-counted
guard). Agent + CLI unchanged. All pre-commit hooks green; pre-push
security fails only on the 3 pre-existing CVEs tracked for chore
issue filing.
Deferred findings from the same review (file as chore issues):
- H1: agent-dies-between-TIMED_OUT-and-resume loses latency
(edge, affects p99 bias)
- H3: late-win APPROVED created_at staleness invariant
(works today, document invariant)
- H4: _warn_cw daemon-thread burst under adversarial payload
- M2-M4: late-win metric, rename helpers, etc.
No upstream PR filing this chunk — deploy to Sam's AWS account
for integration testing first.
…overflow policies Synth + deploy were blocked by cdk-nag: the Cedar HITL additions (TaskApprovalsTable grant + SlackUserMappingTable + extra env vars threaded to the AgentCore runtime) pushed the runtime ExecutionRole past CDK's inline-policy size limit, so CDK auto-splits excess statements into ``OverflowPolicy1``. The overflow inherits the same wildcard ``bedrock:InvokeModel*`` / CloudWatch actions as the base policy but lives at a path (``Runtime/ExecutionRole/OverflowPolicy1/Resource``) that the existing ``addResourceSuppressions(runtime, ..., applyToChildren: true)`` cannot reach — CDK creates overflow policies lazily during synth ``prepare()``, after the construct tree has been frozen and after static suppressions have been cached. Suppress via an Aspect at MUTATING priority so the suppression is applied before cdk-nag's READONLY visitor runs. Matches any path containing ``/Runtime/ExecutionRole/OverflowPolicy`` + ending ``/Resource`` so future ``OverflowPolicy2``, etc. are covered without hardcoding indices. Verified: ``mise //cdk:synth`` now completes cleanly. ``mise //cdk:test`` still 1332/1332.
…gate + CLI error visibility
Three E2E T1.4 + T2.2 findings from the Chunk 10 integration-test
session. Batched into one commit since all three need the same
redeploy to verify:
1. agent/Dockerfile: COPY policies/ into the container image.
``PolicyEngine.__init__`` reads
``/app/policies/hard_deny.cedar`` + ``soft_deny.cedar`` at import
time via ``_POLICIES_DIR = Path(__file__).parent.parent /
"policies"``. The Dockerfile only copied ``src/``, so the
directory was missing and every Cedar-HITL task failed at 0 turns
with ``missing built-in hard-deny policies``. Introduced
alongside Chunk 2 when the policy files were first added —
Dockerfile was never updated. Zero tasks on this branch ever
succeeded in deployed form until now; unit + Jest tests never
caught it because they don't exercise the container layout.
2. cdk/src/handlers/get-policies.ts: add checkRepoOnboarded gate.
Previously the handler was lenient (loaded RepoConfig best-
effort, fell through to built-ins on miss), producing 200 with
the full built-in set for any arbitrary repo string. Users
typo-ing a repo name mistook the response for proof the repo
was onboarded. Now consistent with POST /tasks — 422
REPO_NOT_ONBOARDED for any repo without an active RepoConfig
row. Gate runs AFTER the rate-limit so the 429 doesn't leak
onboarding-status via a 422-vs-200 timing oracle (+1 test
covering this). +2 tests total.
3. cli/src/format.ts + cli/src/commands/watch.ts:
describeReason + formatTerminalMessage now surface the raw
error_message when the classifier's catch-all UNKNOWN fires.
Previously they always preferred
error_classification.{category, title}, turning the concrete
string "missing built-in hard-deny policies: /app/policies/
hard_deny.cedar" into the useless "unknown: Unexpected error"
label on the user's terminal. For KNOWN classifications
(e.g. guardrail: PR context blocked) the new behavior appends
the first line of error_message as concrete evidence — so even
when the category is known, users see the actual diagnostic
inline. +3 test cases covering the UNKNOWN fall-through and the
KNOWN-with-detail path; adjusted 2 existing assertions.
Tests: cdk 1332 → 1334 (+2); cli 217 → 220 (+3); agent 710
(unchanged — no agent code touched). All pre-commit hooks pass.
Pre-push fails only on the 3 pre-existing CVEs carried from main.
Not yet fixed (tracked in .e2e-test-plan.md "Surprises" section
for later chore-issue filing):
- telemetry.py::_METRICS_REDACT_KEYS scrubs error strings too
aggressively — dashboard METRICS_REPORT events show "[redacted]"
for every error including ones with zero secret-leak risk. Should
run output_scanner.py pattern match instead of blanket substitution.
TaskRecord error_message (which the CLI reads) is unaffected;
only the dashboard widget suffers.
- Container stdout goes to /aws/bedrock-agentcore/runtimes/<rt>-DEFAULT
log group, not APPLICATION_LOGS. Dashboard LogQueryWidgets can't
see agent-fatal ERROR lines. Fix is either a dashboard widget
pointing at the runtime log group OR _warn_cw calls on fatal paths.
…roval-timeout/--pre-approve Five fixes batched from the 2026-05-11/12 E2E validation pass — all found by manual integration testing, none were caught by unit tests because the defects were at plumbing seams between correctly-tested components. 1. agent/src/runner.py dropped ``config.user_id`` when wiring hook matchers, so every approval-gate row landed ``user_id=""`` on the ``user_id-status-index`` GSI key. DDB rejected every ``TransactWriteItems`` with ValidationException and the PreToolUse hook fell through to ``_deny_response``. Symptom: agent "completed" force-push tasks with zero visible gating. 2. ``toTaskDetail`` dropped ``approval_gate_count``, ``approval_gate_cap``, and ``awaiting_approval_request_id`` from every task API response — the fields were populated on the TaskRecord but never serialized. ``bgagent status`` couldn't report gate posture. 3. ``GET /pending`` dropped ``matching_rule_ids`` (DB had it, handler didn't map it, type omitted it). Users couldn't see WHICH rule fired on a gate. 4. ``TaskApprovalsTable`` GSI projection didn't include ``matching_rule_ids``. Fixed by destructive recreate under a new construct id (``TaskApprovalsTableV2``) since DDB rejects in-place nonKeyAttributes updates. Design doc §10.1 now carries a "projection is fixed at design time" note and the construct test locks the list via ``Match.arrayWith``. 5. ``bgagent submit`` lacked ``--approval-timeout`` and ``--pre-approve`` flags (server accepted them, CLI didn't expose them). Blocked Phase 5/6 E2E tests on raw curl. Flags are repeatable (``--pre-approve``) and client-side- validated against server constants (APPROVAL_TIMEOUT_S_MIN/MAX, INITIAL_APPROVALS_MAX_ENTRIES). Test deltas: agent 710→711, CLI 220→232 (+12), CDK 1334→1336 (+2). All seven E2E phases re-run post-redeploy; all green except the two dashboard-visual checks that need Sam's eyes.
Batched here because all seven touch the same chunk-3/5/7/8/10 surface
area and none of them needed to ship as their own deploy. The set:
B1 — Strengthen the agent's response to a user DENY. Phase 4 observed
the agent treating "User denied" as "try a different approach" and
burning through max_turns on trivial variations of the same rule. Two
fixes:
* Wrap the reason injected into the model with AUTHORITATIVE-prefixed
stop-language that names the matching rule(s) and tells the agent
subsequent attempts will fast-deny. (agent/src/hooks.py)
* Extend ``RecentDecisionCache`` with a rule-level cache keyed by
``(tool_name, rule_id)`` so semantic retries hit the cache even
when the input hash differs. Populated only on DENIED (TIMED_OUT
stays input-hash scoped because it's ambiguous user-intent).
(agent/src/policy.py)
B2 — Replace blanket ``[redacted]`` substitution on METRICS_REPORT
``error`` fields with ``output_scanner.scan_tool_output``-based
pattern matching. Structural errors like "missing built-in hard-deny
policies: /app/policies/hard_deny.cedar" now survive to the dashboard
Recent-Events widget; real secret patterns (AWS keys, Bearer tokens,
connection strings) still get ``[REDACTED-<LABEL>]``.
(agent/src/telemetry.py)
B3 — Mirror fatal agent ERROR lines to the APPLICATION_LOGS CloudWatch
group via a new ``log_error_cw`` helper. Previously ``log("ERROR",…)``
wrote only to container stdout, which AgentCore routes to
runtime-DEFAULT, so agent-fatal errors were invisible on both the
TaskDashboard widgets and ``bgagent status``. Swap the three fatal
call sites in pipeline.py + runner.py. (agent/src/shell.py)
B4 — Document the get-policies 422 REPO_NOT_ONBOARDED gate
(landed 2026-05-11 in fb69894) in design-doc §7.6, including the
"gate runs AFTER rate-limit to avoid timing oracle" note.
B11 — Declare explicit LogGroups for ApprovalMetricsPublisherFn and
FanOutFn so Lambda-created-at-first-invoke log groups don't have an
implicit grant graph and unbounded retention.
C5 — Call out the Cognito pool constraints in USER_GUIDE.md +
QUICK_START.md: username MUST be an email, password policy is
min 12 chars with all four character classes, ``email_verified=true``
is required at create time, and ``--message-action SUPPRESS`` stops
Cognito from attempting an SES email on new-user creation.
C10 — Collapse the two RepoTable GetItems on the task submit path
(``checkRepoOnboarded`` + ``loadRepoConfig``) into a single
``lookupRepo`` call that returns both verdicts.
Test additions: regression guard asserts exactly one GetItem fires
on submit.
Test deltas:
agent: 721 → 730 (+9 from rule-cache + telemetry-redaction tests)
cli: 232 → 232 (no surface change)
cdk: 1336 → 1337 (+1 from single-GetItem regression guard)
Docs-sync mirror regenerated.
Resolves 13 conflicts across agent, cli, and cdk. Notable decisions:
- agent/pyproject.toml: took upstream's 7 dep bumps (boto3, claude-agent-sdk,
requests, fastapi, uvicorn, aws-opentelemetry-distro, mcp) but held
cedarpy==4.8.0 exact to preserve the @cedar-policy/cedar-wasm@4.10.0
parity contract documented in mise.toml.
- agent/src/pipeline.py, runner.py, server.py: additive merges — kept
both approval_* param set (ours, Cedar HITL) and channel_* param set
(upstream, Slack/Linear). Preserved the log_error_cw wiring that
mirrors fatal ERRORs to APPLICATION_LOGS.
- cli/src/api-client.ts, bin/bgagent.ts: kept both import sets —
GetPendingResponse/GetPoliciesResponse + approve/deny/pending/policies
subcommands (ours) plus LinearLinkResponse + slack/linear subcommands
(upstream).
- cdk/src/stacks/agent.ts: merged aws-cdk-lib imports (AspectPriority +
Aspects + Fn) and construct imports (SlackIntegration +
SlackUserMappingTable). Resolved duplicate SlackUserMappingTableName
CfnOutput by adopting upstream's SlackIntegration construct.
- cdk/src/handlers/shared/create-task-core.ts: merged type imports —
kept our 7 HITL constants (APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT},
APPROVAL_TIMEOUT_S_{MIN,MAX,DEFAULT}, INITIAL_APPROVALS_MAX_ENTRIES)
and upstream's ChannelSource + DEFAULT_MAX_TURNS.
Surgical retirement: deleted HEAD-side link-slack-user.ts + associated
SlackUserMappingTable (PK slack_user_id → cognito_sub) in favor of
upstream's richer OAuth-linking flow (composite slack_identity PK +
PlatformUserIndex GSI, 2-step Cognito-authed link via slash command +
CLI). HEAD's mapping table had no reader anywhere in the codebase —
nothing functional lost. Slack-button approval UX (Cedar HITL approve/
deny action_ids) deferred to a follow-up issue; it extends upstream's
slack-interactions.ts cleanly.
CVE cleanup: pinned astro==6.1.10 exact in docs/package.json to close
GHSA-xr5h-phrj-8vxv. Exact pin avoids the transitive CVE churn at
6.3.x (fast-xml-parser, yaml). Merge also cleared the previously-
residual urllib3 and basic-ftp CVEs tracked in
project_pending_cve_followups.md.
Test baselines:
- agent: 756/757 passing (1 local-env-only failure blocked by Amazon
git-defender; unrelated to this change)
- cli: 238/238 passing across 22 suites
- cdk: 1418/1418 passing across 83 suites
- All pre-commit + pre-push hooks green; security:deps reports zero CVEs
- USER_GUIDE: new §Approval gates (Cedar HITL) section covering pending/approve/deny/policies commands, scope reference, and pre-approval flow. Extends submit options with --approval-timeout and --pre-approve. Adds AWAITING_APPROVAL to the lifecycle state machine + statuses table. Adds three new approval events (approval_requested, approval_recorded, approval_timed_out). - QUICK_START: new §Step 7 end-to-end walkthrough (policies list → submit → watch → pending → approve/deny with reason injection → pre-approval variant for unattended runs). - CEDAR_POLICY_GUIDE (new): blueprint-author reference for writing @tier("hard")/@tier("soft") rules. Covers vocabulary (execute_bash / write_file / context.command / context.file_path), annotation reference (@rule_id, @Severity, @approval_timeout_s, @category), 4 worked patterns (rm -rf /, force-push main, env files, migrations), multi-match behavior (min timeout, max severity), task-start validation errors, capacity budgets (approvalGateCap, maxPreApprovalScope), and cross-engine parity testing via contracts/cedar-parity/ fixtures. - DEVELOPER_GUIDE: new §Writing Cedar policies for the repo subsection pointing to the new guide from §Repository preparation. - sync-starlight + astro sidebar: wire CEDAR_POLICY_GUIDE into the /customizing/cedar-policies route.
…+ resource tagging Second upstream merge to stay current before PR review. Brings in 3 commits since the previous merge at f36d352: - ff79c24 chore(deps): bump astro from 6.1.6 to 6.1.10 Upstream caught up to the exact pin we already have; kept our astro: 6.1.10 exact pin (not upstream's ^6.1.10) to block the lockfile from drifting to 6.3.2 and re-introducing transitive CVEs (fast-xml-parser, yaml). security:deps: zero CVEs. - a59ca35 feat(cdk): add github:* resource tagging strategy Auto-merged; only surfaced in our stack via a new ArnFormat import in agent.ts (resolved inline). - 9592796 feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (aws-samples#64). Significant upstream refactor (11 files, ~2000 lines) moving Slack outbound delivery from a standalone DDB Streams consumer onto FanOutConsumer. Drops TaskEventsTable from 2 concurrent stream readers to 1, restoring headroom for future channels. Our Cedar HITL surface is unaffected. Conflicts resolved (5): - docs/package.json: kept our exact astro pin (see above). - yarn.lock: regenerated clean via yarn install. - cdk/src/stacks/agent.ts: merged aws-cdk-lib import — kept ours (AspectPriority, Aspects for cdk-nag overflow suppression) and added upstream's ArnFormat (new resource-tagging strategy). - cdk/src/handlers/fanout-task-events.ts: merged the CHANNEL_DEFAULTS comment block — kept upstream's richer rationale for task_created/session_started inclusion and pr_created exclusion, preserved the Cedar HITL approval-gate phrasing. - cdk/test/handlers/fanout-task-events.test.ts: two drift-guard tests updated to reflect the merged CHANNEL_DEFAULTS.slack contents (our approval_requested/approval_stranded + upstream's session_started). The forwardCompat set (events in the filter that don't yet have a Slack renderer) now lists both approval events plus status_response — Slack-button approval renderers are a deferred follow-up. Tests: cdk 1526/1526 across 85 suites. Full pre-commit + pre-push hooks green; security:deps reports zero CVEs.
|
Thanks ! Automated review: Critical Issues (3 found) C1. Missing fail-closed guard on _pre hook wrapper Files: agent/src/hooks.py (the _pre function in build_hook_matchers) The _pre wrapper around pre_tool_use_hook has no try/catch. If _handle_require_approval raises an uncaught exception (e.g., asyncio.CancelledError, unexpected TypeError), the Fix: Add a fail-closed catch to _pre that returns permissionDecision: "deny" on any unhandled exception. C2. CreateTaskRequest type drift — missing attachments field in CLI Files: cdk/src/handlers/shared/types.ts vs cli/src/types.ts The CDK CreateTaskRequest includes attachments?: Attachment[] and defines the Attachment interface, but the CLI mirror has neither. This violates the explicit AGENTS.md contract: Fix: Add Attachment interface and attachments field to cli/src/types.ts. C3. prompt_version field missing from CLI TaskDetail Files: cdk/src/handlers/shared/types.ts vs cli/src/types.ts CDK's TaskDetail has readonly prompt_version: string | null; but the CLI mirror omits it. Another AGENTS.md sync violation. Fix: Add readonly prompt_version: string | null; to CLI's TaskDetail. Important Issues (5 found) I1. wrote = True on TIMED_OUT exception ignores user's approval File: agent/src/hooks.py ~line 417-428 When the TIMED_OUT status write raises an exception, the code sets wrote = True, which skips the IMPL-24 re-read path. If the user's APPROVED decision landed on the row during a Fix: Change to wrote = False on exception so the re-read path checks for a late approval. I2. write_terminal precondition excludes AWAITING_APPROVAL File: agent/src/task_state.py, write_terminal function The ConditionExpression only accepts RUNNING, HYDRATING, and FINALIZING as prior statuses. If the container crashes while in AWAITING_APPROVAL, the shutdown handler cannot transition Fix: Add AWAITING_APPROVAL to the condition expression. I3. task_state.py uses print() instead of structured logging File: agent/src/task_state.py (8+ locations) All error logging uses bare print() instead of shell.log(). These messages don't route to CloudWatch Log Insights, can't trigger alarms, and have no structured fields for correlation. Fix: Migrate to from shell import log with structured fields. I4. Scope validation failures not logged in approve-task.ts File: cdk/src/handlers/approve-task.ts ~line 99-107 A failed parseApprovalScope returns 400 but is not logged. This is security-relevant — malformed scopes could indicate probing, CLI bugs, or version mismatches. Fix: Add logger.warn with task_id, raw_scope, error, and user_id before the 400 return. I5. Agent-side DDB transaction functions lack unit tests File: agent/tests/test_task_state.py transact_write_approval_request, transact_resume_from_approval, and best_effort_update_approval_status are tested only via the _FakeTaskState double in hook tests. The real DDB Suggestions (10 found) S1. No concurrent approve + deny test (Test Analyzer, 8/10) No test verifies that simultaneous approve and deny on the same request_id results in exactly one winning. The DDB condition guards make this safe, but the invariant is unpinned. S2. _poll_for_decision consecutive-failure path untested (Test Analyzer, 8/10) The POLL_MAX_CONSECUTIVE_FAILS (10 failures → timeout) safety path has no direct test. Only the all-PENDING timeout path is tested. S3. ApprovalRecord should be a discriminated union on status (Type Analyzer) A PENDING record has no decided_at/scope; an APPROVED record requires both. Currently all fields are optional on every status — illegal combinations are representable. S4. Severity literal 'low' | 'medium' | 'high' repeated inline 6+ times (Type Analyzer) Used across ApprovalRecord, PendingApprovalSummary, PolicyRuleSummary, ParsedRule, etc. A shared Severity type alias would prevent drift. S5. formatMinuteBucket duplicated in 3 Lambda handlers (Code Reviewer) Identical helper in approve-task.ts, deny-task.ts, and get-pending.ts. Should be extracted to a shared module. S6. GetPendingFn has grantReadWriteData when scoped permissions suffice (Code Reviewer) Least-privilege concern — the handler only needs read + scoped write for rate-limit counters, not full table write. S7. approval_row: dict in task_state.py is untyped (Type Analyzer) No schema validation on the dict shape passed to transact_write_approval_request. A TypedDict would catch missing fields at call sites. S8. No CI enforcement for CDK/CLI type sync (Type Analyzer) The sync contract relies on developer discipline. A structural diff or export-name comparison would formalize it. S9. Triple-codebase constant sync for APPROVAL_GATE_CAP_MIN/MAX (Type/Code Reviewers) Bounds constants exist in cdk/types.ts, cli/types.ts, and agent/policy.py with no automated drift detection. Acknowledged as deferred in the PR. S10. _try_progress swallows all exceptions with only WARN logging (Silent Failure Hunter) 15+ call sites could mask TypeError bugs. No counter or escalation for repeated failures — the user's live event stream silently goes incomplete. Strengths
|
cdk-nag introduced AwsSolutions-COG8 ("The Cognito user pool is not on
the plus tier / feature plan") which fires on the dev UserPool and
fails ``cdk deploy`` at synth time. Same rationale as the existing
COG3 suppression (advanced-security / Plus tier features are not
required for the dev environment's CLI-based auth flow).
Surfaced during a live deploy attempt; required to unblock any
``cdk deploy`` against this branch.
Summary
Adds a Cedar-policy-driven Human-In-The-Loop (HITL) approval gate around every agent tool call. Most calls continue to resolve as a plain
Allow/Denywith no human involvement. For a small, explicitly-marked set of policy rules (@tier("soft")), the decision becomes require-approval: the agent pauses, the task transitions toAWAITING_APPROVAL, a live-stream marker is emitted, and the run resumes (or adapts to a denial reason) after the user decides via REST or CLI.Design reference:
docs/design/CEDAR_HITL_GATES.md.What's new (user-visible)
POST /v1/tasks/{id}/approve,POST /v1/tasks/{id}/deny,GET /v1/tasks/pending,GET /v1/tasks/policies.bgagent pending | approve | deny | policies list|show, plusbgagent submit --approval-timeoutand--pre-approvefor unattended runs.AWAITING_APPROVALstatus with atomicRUNNING ↔ AWAITING_APPROVALtransitions on the task + approvals tables.approval_requested,approval_recorded, andapproval_timed_outaudit events; dashboard widget for pending/decided counts; EMF-backed alarm on pending-age p95.security.cedarPoliciesinblueprint.yamlnow accepts per-repo hard/soft rules plusdisable:,maxPreApprovalScope, andapprovalGateCapknobs.User-facing documentation
AWAITING_APPROVAL, new events, and the two new submit flags threaded through the existing lifecycle/submit docs.Safety posture
Allow.--pre-approve all_sessionand every other scope form cannot bypass them. Blueprint loader rejects anydisable:directive targeting a built-in hard-deny rule.contracts/cedar-parity/): every rule change is exercised against both the Pythoncedarpyengine and the TypeScriptcedar-wasmengine; divergence fails CI.sub) can approve/deny its gates.Notable merge surgery
This branch contains two
upstream/mainmerges to stay current during review. Worth flagging for reviewers:First merge (
7019663) — integrated the Slack + Linear integration work:link-slack-user.tshandler +SlackUserMappingTableschema in favor of upstream's richer Slack integration (OAuth, slash commands, interactions, notify). HEAD's mapping table had no reader; nothing functional was lost. A follow-up issue will extend upstream'sslack-interactions.tswithapprove_action:/deny_action:block_actions for Slack-button approvals.cedarpy==4.8.0exact inagent/pyproject.toml(not upstream's>=4.8.1) to preserve the@cedar-policy/cedar-wasm@4.10.0parity contract documented inmise.toml.astro==6.1.10exact indocs/package.jsonto close GHSA-xr5h-phrj-8vxv without pulling in transitive CVEs at 6.3.x.TaskApprovalsTableconstruct id renamed toTaskApprovalsTableV2(load-bearing) — the original logical id was abandoned mid-development after the first ship of theuser_id-status-indexGSI; addingmatching_rule_idsto the projection required a destructive recreate (DDB rejects in-placenonKeyAttributesedits).Second merge (
d3791d6) — integrated 3 more upstream commits that landed while the PR was open:ff79c24astro 6.1.6 → 6.1.10 bump (upstream caught up to our exact pin; kept ours to block drift to 6.3.2 + transitive CVEs).a59ca35github:*resource tagging strategy (newArnFormatimport inagent.ts).9592796SlackNotifyFn migrated to FanOutConsumer subscriber (11 files, ~2000 lines upstream refactor — moves Slack delivery off its own DDB stream consumer onto FanOutConsumer, dropping TaskEventsTable from 2 to 1 concurrent stream readers). The only adjustments on our side were updating two drift-guard tests infanout-task-events.test.tsto reflect the mergedCHANNEL_DEFAULTS.slackcontents (ourapproval_requested/approval_stranded+ upstream'ssession_started). Our Cedar HITL surface is otherwise untouched.Test plan
All suites exercised end-to-end (local + deployed stack on account
169728770098, us-east-1):agentpytest — 756 passing (1 local-env-only failure blocked by Amazon git-defender; unrelated to this change).clijest — 238 passing across 22 suites.cdkjest — 1526 passing across 85 suites (up from 1418 pre-second-merge: +108 new tests from upstream's fanout/slack-notify refactor).contracts/cedar-parity/) — every fixture passes on both engines.backgroundagent-dev: triggered soft-deny gates on force-push/env-write, approved with multiple scopes, denied with reason-injection, pre-approved scopes at submit, observed accurate timeout→denial transitions, verified dashboard widget + EMF alarms.Deferred / follow-up
Filed as upstream issues after merge (per our issue-filing discipline):
slack-interactions.tswithapprove_action:/deny_action:action_idprefixes + add approval-request Block Kit renderer.APPROVAL_GATE_CAP_MIN/MAXconstants — currently triplicated acrossagent/src/policy.py,cdk/src/handlers/shared/types.ts, andcdk/src/constructs/blueprint.ts.LogQueryWidget → EMFmigration — long-term observability cleanup.