feat(ci): add deploy pipeline with OIDC, dynamic stack naming, and deploy-intent artifact#98
Conversation
4191636 to
4852589
Compare
Review summaryThanks for this — the architecture is sound and the security framing is genuinely strong (SHA-pinned actions, env-var-only untrusted input, deny-by-default permissions, OIDC, approval gate, synth-once/deploy-exact-artifact). It aligns cleanly with the Phase 3 goals in #73. A few items I'd like to see addressed before merge, none requiring a redesign. 🔴 BlockingB1 — valid_json=$(echo "$valid_json" | jq --arg t "$type" '. + [$t]') # missing -c
B2 — Deploy job runs a PR-synthesized cloud assembly with B3 — Stack-name
Fix: result=$(... | cut -c1-60 | sed 's/-$//')
if [[ ! "$result" =~ ^[a-z] ]]; then result="s-${result}"; fi🟡 Non-blocking
📄 DocsNicely done — ✅ Tests & CIAll checks green. Main gap is that the new bash logic (sanitize, intent resolution, label filtering) has no test coverage — B1 is exactly the kind of bug one end-to-end Governance notes (per ADR-003)
Bottom line: B1 is a one-character fix that unblocks the label path, B2 is the trust gap worth closing (at least Reviewed as Principal Architect; security + shell-logic analysis run by specialized review agents. |
Deploy CI belongs in PR #98 (feat/deploy-yml), not the roadmap docs PR. Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(ci): rename computeVariant to compute_type and apply as resource tag Aligns CI and CDK terminology with the existing ComputeType union in repo-config.ts. build.yml matrix key, env var, and cdk.context.json key are all renamed from computeVariant to compute_type. The CDK app now reads compute_type from context (default: agentcore) and applies it as a resource tag for per-type baseline diffs and cost attribution. Closes phase 2 items in #73. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(ci): add deploy.yml workflow triggered by build completion Adds a deploy workflow that: - Fires on workflow_run after build.yml succeeds - Resolves deploy targets from PR labels (deploy=all, deploy:<type>=one) or defaults to all registered types on push to main - Skips entirely (no approval prompt) when no deploy labels are present - Downloads the exact cdk-<compute_type>-out artifact from the build run - Uses OIDC to assume the CDK bootstrap deploy role - Deploys via `cdk deploy --app cdk/cdk.out --all --require-approval never` - Protected by the `deploy` GitHub environment (manual approval required) - Concurrency: non-cancellable once started, max-parallel 3 Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(ci): dynamic stack naming and workflow_dispatch deploy trigger build.yml: Replace hardcoded stackName with trigger-aware naming: - push to main: main-<compute_type> - pull_request: pr<number>-<compute_type> - merge_group: mg<pr_number>-<compute_type> - workflow_dispatch: <branch>-<compute_type> - fallback: <compute_type>-<sha7> All inputs sanitized (alphanumeric + hyphens, 60-char branch cap). deploy.yml: Add workflow_dispatch trigger with compute_type choice input (all, agentcore). Handle non-PR triggers (push to main, workflow_dispatch on build) by deploying all registered types. Label-based resolution only applies to PR triggers. Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor(ci): move deploy input to build.yml, use deploy-intent artifact build.yml now owns the deploy decision via workflow_dispatch choice input: - "-" (default): no deploy - "agentcore": deploy agentcore after build Build always writes a deploy-intent.json artifact encoding the decision: - push to main: intent = compute_type (deploy) - workflow_dispatch with choice: intent = selected value - pull_request: intent = "labels" (defer to deploy.yml label check) - anything else: intent = "-" (no deploy) deploy.yml simplified to a pure consumer: - Removes workflow_dispatch trigger (single entry point is build.yml) - Downloads deploy-intent.json from triggering build run - Reads intent: "-" = skip, "labels" = check PR labels, else = deploy Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): add input validation and allowlist enforcement for compute types Addresses 5 security findings: 1. CRITICAL: deploy.yml wildcard case now validates intent against ALLOWED_COMPUTE_TYPES before passing to matrix. Invalid values cause the workflow to fail with an error annotation. 2. MEDIUM: PR label deploy:<type> values are filtered through validate_compute_type(). Invalid labels emit a warning and are ignored rather than passed to the deploy matrix. 3. MEDIUM: sanitize() now lowercases input and prefixes "s-" if the result starts with a digit (CloudFormation requires letter start). 4. LOW: deploy-intent.json is now written with jq (safe JSON encoding) instead of shell string interpolation. 5. LOW: PR_NUMBER is validated as numeric before use in stack names. The ALLOWED_COMPUTE_TYPES allowlist is defined as an env var in each step that performs validation. When new compute types are added to the matrix, this allowlist must be updated in both build.yml and deploy.yml. Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): add deploy-intent.json to .gitignore The file is generated during build and was being picked up by the mutation detection step, causing the build to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(security): skip deploy pipeline for fork PRs + document CI/CD flow Adds `head_repository.full_name == github.repository` guard to deploy.yml's resolve-targets job. Prevents fork PRs from triggering deploy approval prompts — a fork could modify build.yml to produce a deploy-intent artifact that would otherwise reach the approval gate. Also documents the full deploy.yml pipeline in DEPLOYMENT_GUIDE.md (build → deploy stages, security controls, fork exclusion rationale, administrator setup). This makes the deploy flow explicit for contributors, reviewers, agents, and administrators. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore(docs): update roadmap to reflect recent features * chore(ci): remove deploy pipeline from roadmap branch Deploy CI belongs in PR #98 (feat/deploy-yml), not the roadmap docs PR. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: scottschreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Scott Schreckengaust <scottschreckengaust@users.noreply.github.com> Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Cursor <cursoragent@cursor.com>
Will fix
But without the
Yes |
|
Re: Governance notes
|
Adds a deploy workflow that: - Fires on workflow_run after build.yml succeeds - Resolves deploy targets from PR labels (deploy=all, deploy:<type>=one) or defaults to all registered types on push to main - Skips entirely (no approval prompt) when no deploy labels are present - Downloads the exact cdk-<compute_type>-out artifact from the build run - Uses OIDC to assume the CDK bootstrap deploy role - Deploys via `cdk deploy --app cdk/cdk.out --all --require-approval never` - Protected by the `deploy` GitHub environment (manual approval required) - Concurrency: non-cancellable once started, max-parallel 3 Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build.yml: Replace hardcoded stackName with trigger-aware naming: - push to main: main-<compute_type> - pull_request: pr<number>-<compute_type> - merge_group: mg<pr_number>-<compute_type> - workflow_dispatch: <branch>-<compute_type> - fallback: <compute_type>-<sha7> All inputs sanitized (alphanumeric + hyphens, 60-char branch cap). deploy.yml: Add workflow_dispatch trigger with compute_type choice input (all, agentcore). Handle non-PR triggers (push to main, workflow_dispatch on build) by deploying all registered types. Label-based resolution only applies to PR triggers. Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
build.yml now owns the deploy decision via workflow_dispatch choice input: - "-" (default): no deploy - "agentcore": deploy agentcore after build Build always writes a deploy-intent.json artifact encoding the decision: - push to main: intent = compute_type (deploy) - workflow_dispatch with choice: intent = selected value - pull_request: intent = "labels" (defer to deploy.yml label check) - anything else: intent = "-" (no deploy) deploy.yml simplified to a pure consumer: - Removes workflow_dispatch trigger (single entry point is build.yml) - Downloads deploy-intent.json from triggering build run - Reads intent: "-" = skip, "labels" = check PR labels, else = deploy Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ypes Addresses 5 security findings: 1. CRITICAL: deploy.yml wildcard case now validates intent against ALLOWED_COMPUTE_TYPES before passing to matrix. Invalid values cause the workflow to fail with an error annotation. 2. MEDIUM: PR label deploy:<type> values are filtered through validate_compute_type(). Invalid labels emit a warning and are ignored rather than passed to the deploy matrix. 3. MEDIUM: sanitize() now lowercases input and prefixes "s-" if the result starts with a digit (CloudFormation requires letter start). 4. LOW: deploy-intent.json is now written with jq (safe JSON encoding) instead of shell string interpolation. 5. LOW: PR_NUMBER is validated as numeric before use in stack names. The ALLOWED_COMPUTE_TYPES allowlist is defined as an env var in each step that performs validation. When new compute types are added to the matrix, this allowlist must be updated in both build.yml and deploy.yml. Part of #73 Phase 3. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The file is generated during build and was being picked up by the mutation detection step, causing the build to fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds `head_repository.full_name == github.repository` guard to deploy.yml's resolve-targets job. Prevents fork PRs from triggering deploy approval prompts — a fork could modify build.yml to produce a deploy-intent artifact that would otherwise reach the approval gate. Also documents the full deploy.yml pipeline in DEPLOYMENT_GUIDE.md (build → deploy stages, security controls, fork exclusion rationale, administrator setup). This makes the deploy flow explicit for contributors, reviewers, agents, and administrators. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…-line matrix corruption Without -c, jq pretty-prints the JSON array across multiple lines. GitHub Actions parses GITHUB_OUTPUT as one key=value per line, so `matrix` captured only `[` and fromJson() in the deploy job failed — the deploy:<type> label path never deployed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… empty/leading-hyphen results Two bugs in stack-name sanitize(): 1. cut -c1-60 ran after trailing-hyphen trim, so a hyphen at position 60 survived → CloudFormation rejects trailing hyphens. Fix: reorder to cut first, then trim. 2. All-special-char refs sanitized to empty → STACK_NAME="-agentcore" (leading hyphen). The guard only checked ^[0-9], not ^[a-z]. Fix: guard checks for empty or any non-letter start. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
jq's index("deploy:*") matches only a label literally named "deploy:*",
not a wildcard. The bare "deploy" label already triggers all-types
deployment. This dead branch was misleading and could become a surprise
full-deploy vector if such a label were ever created.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…t risk ALLOWED_COMPUTE_TYPES was duplicated across both workflows plus the matrix; adding a new compute type required updating 5 places. Now deploy.yml derives ALL_TYPES dynamically from ALLOWED_COMPUTE_TYPES, and cross-reference comments link the remaining sync points. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…flows build.yml uses exit 1 (no recovery — invalid dispatch input aborts step). deploy.yml uses return 1 (callers handle gracefully — skip type or set empty matrix). One-line comments prevent future confusion. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…empty
workflow_run.pull_requests[0].number is empty for forks and often for
indirect workflow_run triggers. Previously the labels deploy path would
silently no-op. Now we fall back to gh api commits/{sha}/pulls to
resolve the PR number, with a warning if neither source yields a result.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bd71c16 to
52a6a99
Compare
|
Review follow-up — status of all items Branch rebased onto latest Blocking
Non-blocking
Docs / Governance
All code-actionable items from the review are addressed. Only B2 remains as a design decision. |
51de804 to
27386f9
Compare
…ng control) Adds a `diff` job that runs immediately after resolve-targets (no environment gate). It downloads the same CDK artifact, assumes AWS credentials, and runs `cdk diff` — surfacing all CloudFormation changes in the GitHub step summary. The reviewer can inspect the diff before clicking "Approve" on the deploy environment gate. The deploy job now `needs: [resolve-targets, diff]` ensuring the diff completes before approval is requested. `--require-approval never` is retained on the deploy step because CDK hard-fails without TTY in CI; the GitHub environment approval gate is the human checkpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
27386f9 to
09a6eaf
Compare
Summary
Adds the deployment pipeline (Phase 3 of #73) on top of PR #97 (
compute_typerename).Architecture
Commits
compute_typerenamecomputeVariant→compute_typein build.yml, context, CDK tag (from PR #97)Stack Naming
mainmain-<type>main-agentcore<branch>-<type>feat-deploy-yml-agentcorepr<N>-<type>pr42-agentcoremg<N>-<type>mg42-agentcoreworkflow_dispatch<branch>-<type>main-agentcore<type>-<sha7>agentcore-abc1234Sanitization: lowercase,
/_. → -, strip non-[a-z0-9-], collapse hyphens, 60-char cap, prefixs-if starts with digit.Deploy Intent Flow
mainagentcore-workflow_dispatchwith--workflow_dispatchwithagentcoreagentcorepull_requestlabels-Label-Driven Deploy (PR only)
deploydeploy:agentcoreagentcoreonlydeploy:*deploy*labelLabel values are validated against
ALLOWED_COMPUTE_TYPES. Invalid labels emit a warning and are silently dropped.Security
choicetype — enforced at UI and APIALLOWED_COMPUTE_TYPESenv var;validate_compute_type()filter_valid_types()rejects invaliddeploy:<type>labels^[0-9]+$check before usejq -n --arg(not shell interpolation)env:, never inrun:directlypermissions: {}at top; least-privilege per jobdeployenvironment with manual approval, prevent self-reviewcancel-in-progress: falseonce deployment startsTest plan
github-tags.test.ts— 5/5 pass (compute_type tag defaults + explicit)workflow_dispatchbuild withdeploy: agentcoredeploylabeldeploy:foolabel emits warning and is ignoredNot in scope (future work)
cdk diffoutput to step summarycleanup.yml)🤖 Generated with Claude Code