feat(e2e): parallel test runs via org pool by ralphbean · Pull Request #1215 · fullsend-ai/fullsend

ralphbean · 2026-05-19T20:53:46Z

Summary

Replace the single hardcoded halfsend test org with a pool of 6 orgs (halfsend-01 through halfsend-06)
Each e2e run scans the pool for an unlocked org and claims it, only waiting if all 6 are locked
Remove the global e2e-halfsend workflow concurrency group — contention is now handled by the org pool
Thread the acquired org through all test infrastructure (setupE2ETest, cleanup, browser helpers)

Dependencies

Depends on #1211 (TOTP support) — this branch includes those commits.

Test plan

go vet -tags e2e ./e2e/admin/... passes
Lock unit tests pass (TestAcquireLock, TestReleaseLock)
make lint passes
Full e2e run against one of the new orgs

Closes #1047

🤖 Generated with Claude Code

github-actions · 2026-05-19T20:55:26Z

Site preview

Preview: https://2be0b1cc-site.fullsend-ai.workers.dev

Commit: 74963c075c05bb570bc952082ff0166c0c1cb596

fullsend-ai-review · 2026-05-19T20:58:13Z

Review

Findings

High

[sensitive-data] hack/setup-new-e2e-org.sh:138 — Hardcoded GCP project name it-gcp-konflux-dev-fullsend as a default value. CLAUDE.md explicitly prohibits committing GCP project names: "Never commit secrets or sensitive data (GCP project names, ...). Use environment variables with no defaults for sensitive values."
Remediation: Remove the default and require MINT_PROJECT to be set via env var: MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}"

Medium

[protected-path] .github/workflows/e2e.yml — This PR modifies a file under .github/, which is a protected path requiring human approval regardless of review outcome. Changes include: adding id-token: write permission, GCP WIF authentication step, removing the concurrency group, increasing timeout to 30m, and broadening trigger paths.
[docs-currency] CLAUDE.md — The PR makes E2E_MINT_URL a required env var for e2e tests (the test skips without it), and adds E2E_GCP_PROJECT_ID as an optional env var. Neither is documented in the "Running e2e tests" section of CLAUDE.md, which currently lists only E2E_GITHUB_PASSWORD, E2E_GITHUB_PASSWORD_FILE, E2E_GITHUB_SESSION_FILE, and E2E_GITHUB_TOTP_SECRET.
Remediation: Add E2E_MINT_URL (required) and E2E_GCP_PROJECT_ID (optional) to the e2e env var list in CLAUDE.md.

Low

[misleading-comment] e2e/admin/testutil.go:100 — Comment says "Pass remaining context time, not the full timeout" but the code passes the full timeout value to acquireLock. The bounding actually comes from deadlineCtx, not the timeout parameter. The code works correctly but the comment is misleading.
Remediation: Update the comment to: "The shared deadlineCtx bounds total wait; the timeout parameter controls acquireLock's internal polling but the context expires first."
[fragile-error-matching] e2e/admin/lock.go:220 — isRepoAlreadyExists matches on the string "422" in the error message. HTTP 422 can indicate many validation errors besides "already exists" (e.g., invalid repo name). While acceptable for e2e infra, this could mask unexpected failures.
Remediation: Prefer checking for "already exists" only, or implement typed error checking from the GitHub client if available.

Previous run

Review

Findings

Medium

[protected-path] .github/workflows/e2e.yml — This PR modifies a protected path (.github/). The PR links to e2e: increase throughput with parallel-safe test isolation #1047 and explains the rationale (removing concurrency group, adding GCP WIF auth, expanding path triggers, increasing timeout). Human approval is always required for protected-path changes, regardless of context.
[correctness] e2e/admin/admin_test.go — The refactor removes significant test coverage that previously existed: (1) idempotency testing (second install was verified as no-op, second uninstall accepted not-found errors), (2) app setup via manifest flow (entire browser.go deleted, --skip-app-setup used instead), (3) detailed verification of per-role secrets/variables and analyze-layer-status assertions. The test now exercises the CLI subprocess path which is valuable, but the coverage reduction is notable. Consider whether idempotency testing should be preserved in a separate test or tracked as follow-up work.
[correctness] e2e/admin/testutil.go:70 — The acquireOrg fallback loop comment says "Pass remaining context time, not the full timeout" but actually passes timeout (the full value) to acquireLock. The code still works correctly because deadlineCtx constrains the actual execution time, but the timeout parameter controls staleness checks inside acquireLock, which means each org gets the full staleness threshold rather than a proportional share. With 2 orgs this is fine, but at 6 orgs a stale lock on org 1 could consume most of the shared deadline before org 2 is tried.

Low

[correctness] e2e/admin/lock.go — isRepoAlreadyExists checks strings.Contains(msg, "422") which is fragile. GitHub returns HTTP 422 for various validation errors, not only duplicate repos. A more precise check (e.g., checking for the specific GitHub error message "name already exists on this account") would be more robust. In practice this is unlikely to cause issues since the lock repo name is fixed.
[docs-currency] CLAUDE.md:29-40 — The "Running e2e tests" section does not document the new required E2E_MINT_URL env var (which now causes test skip if unset) or the optional E2E_GCP_PROJECT_ID. The description also still describes tests as using "Playwright browser automation" for the install/uninstall flow, but the test now uses CLI subprocess (runCLI) for install/uninstall while Playwright is only used for PAT management and session handling.

Info

[correctness] e2e/admin/testutil.go — Only 2 of 6 pool orgs (halfsend-01, halfsend-02) are uncommented. The remaining 4 are commented out as "not yet enrolled in mint." Effective parallelism is 2, not 6 as the PR title and ADR might suggest. The hack/setup-new-e2e-org.sh script provides a clear path to expand the pool.

Previous run (2)

Review

Findings

High

[platform-security] hack/setup-new-e2e-org.sh:182 — Hardcoded GCP project name it-gcp-konflux-dev-fullsend as a default value for MINT_PROJECT. CLAUDE.md prohibits committing GCP project names and requires environment variables with no defaults for sensitive values.
Remediation: Remove the default so the variable is required: MINT_PROJECT="${MINT_PROJECT:?Set MINT_PROJECT to the GCP project ID}".

Medium

[protected-path] .github/workflows/e2e.yml — This PR modifies a protected path (.github/). The PR links to issue e2e: increase throughput with parallel-safe test isolation #1047 and explains the rationale (removing the serializing concurrency group, adding GCP auth and new env vars for the org pool). Human approval is always required for protected-path changes, regardless of context.
[correctness] e2e/admin/lock.go:219 — isRepoAlreadyExists checks strings.Contains(msg, "422") which is overly broad. A 422 from GitHub can indicate many validation failures beyond "repo already exists" (e.g., invalid name, name too long, name reserved). This could cause non-lock errors to be silently swallowed, masking real failures during lock acquisition.
Remediation: Check for a more specific substring like "name already exists" or parse the GitHub API error response structure to inspect the error code/message field.

Low

[correctness] e2e/admin/testutil.go:47 — defaultLockTimeout increased from 2 minutes to 20 minutes. While the fallback only triggers when all orgs are locked, a 20-minute hang with no progress indication could waste CI runner minutes if there is a systemic issue (e.g., all orgs permanently locked due to an infrastructure problem). Consider a shorter timeout or more frequent status logging during the wait.

Previous run (3)

Review

Findings

Medium

[protected-path] .github/workflows/e2e.yml — This PR modifies a file under .github/, which is a protected path requiring human approval. The change removes the e2e-halfsend concurrency group (replaced by org-pool contention handling) and adds E2E_GITHUB_TOTP_SECRET and E2E_MINT_URL secret references. The PR links to e2e: increase throughput with parallel-safe test isolation #1047 and the rationale is well-documented, but human approval is always required for protected-path changes regardless of context.

Low

[correctness] e2e/admin/testutil.go:82 — In the acquireOrg fallback loop, the comment says "Pass remaining context time, not the full timeout, so that each org only uses its fair share of the shared deadline" but the code passes timeout (the full original value) to acquireLock. The shared deadlineCtx does bound total wait time correctly, but the first org in the fallback sequence can consume the entire deadline, leaving nothing for subsequent orgs. The comment is misleading about fair-share behavior. Consider either dividing the remaining time across orgs or updating the comment to reflect that the context deadline is the only bound.
Remediation: Update the comment to accurately describe the bounding behavior, or pass time.Until(deadline) / time.Duration(len(orgPool)-i) as the per-org timeout.
[correctness] e2e/admin/lock.go:125 — isRepoAlreadyExists checks for the substring "422" in error messages, which could match non-"already exists" 422 errors (e.g., validation errors on other fields). In practice this is low-risk since the function is only called after CreateRepo, but the check is broader than intended.
Remediation: Consider checking for a more specific substring like "name already exists" if the GitHub API error message is stable enough.

Previous run (4)

Review

Findings

Medium

[protected-path] .github/workflows/e2e.yml — This PR modifies a protected path (.github/). The PR links to e2e: increase throughput with parallel-safe test isolation #1047 and the description explains the rationale (removing global concurrency group, adding TOTP secret). Human approval is required for protected-path changes regardless of context.

Low

[style] e2e/admin/cleanup.go:499,507,517,539 — Missing space after org parameter in multiple deleteAppViaPlaywright calls: page, org,slug should be page, org, slug. Four occurrences in cleanupStaleResources.
Remediation: Add a space after the comma: deleteAppViaPlaywright(page, org, slug, ...)
[docs-currency] CLAUDE.md:33-37 — The "Running e2e tests" section documents E2E_GITHUB_PASSWORD, E2E_GITHUB_PASSWORD_FILE, and E2E_GITHUB_SESSION_FILE env vars. This PR adds E2E_GITHUB_TOTP_SECRET as a new optional env var for 2FA accounts, but CLAUDE.md is not updated to document it.
Remediation: Add E2E_GITHUB_TOTP_SECRET to the env var list in the "Running e2e tests" section of CLAUDE.md.
[correctness] e2e/admin/testutil.go — The acquireOrg fallback loop iterates all orgs sequentially with the full timeout per org, so worst-case wait is len(orgPool) × timeout (12 minutes with defaults). The comment "only waiting if all 6 are locked" is accurate but doesn't convey the total worst-case duration. Consider adding a log line showing total maximum wait or applying a global deadline across the fallback loop.
Remediation: Wrap the fallback loop with a single deadline (time.Now().Add(timeout)) and pass remaining time to each acquireLock call, or document the multiplicative behavior.

waynesun09

Review Squad Report — 10-agent review

Agents: 3× Claude Coder, 3× Claude Researcher, 2× Gemini, 2× Cursor
Findings: 14 after dedup, 4 false positives removed

Inline comments posted (2 HIGH, 5 MEDIUM)

Sev	Finding	File	Consensus
HIGH	Fallback loop waits N × timeout	`testutil.go:69`	10/10 agents
HIGH	Concurrency group is a no-op	`e2e.yml:15`	8/10 agents
MEDIUM	`Second() % 30` vs `Unix() % 30` boundary bug	`otp.go:21`	1 agent (verified)
MEDIUM	`NetworkIdle` wait can hang	`page.go:54`	7/10 agents
MEDIUM	TOTP retry generates same code	`page.go:70`	2 agents
MEDIUM	Fixed-order thundering herd	`testutil.go:52`	2 agents
MEDIUM	No `acquireOrg` unit tests	`lock_test.go:15`	5 agents

Deferred hardening (LOW — not blocking)

Missing space after comma in 4× deleteAppViaPlaywright calls (cleanup.go:51,58,66,84) — run gofmt
E2E_HALFSEND_WIF_PROVIDER env var still references old org name — rename in follow-up
handleSudoIfPresent fails when password field visible but empty, without trying TOTP fallback
waitForPageToLeave uses busy-polling instead of Playwright native waits
orgPool is a mutable package-level var

waynesun09 · 2026-05-19T22:50:57Z

+
+	// All orgs are locked. Fall back to waiting on the first available.
+	logf("[org-pool] All %d orgs are locked, waiting with timeout %s", len(orgPool), timeout)


HIGH — Fallback loop waits N × timeout instead of timeout

Each acquireLock call blocks for the full timeout (default 2 min). With 6 orgs, worst case = 6 × 2 min = 12 min — nearly exhausting the 15-min job timeout. The comment says "waiting with timeout" (singular) but the actual behavior multiplies it by pool size.

Consider using a single shared deadline across the fallback:

deadline := time.Now().Add(timeout) for time.Now().Before(deadline) { for _, org := range orgPool { remaining := time.Until(deadline) if remaining <= 0 { break } err := acquireLock(ctx, client, token, org, runID, min(remaining, lockPollInterval*2), logf) if err == nil { return org, nil } } }

Flagged by 10/10 review agents (unanimous consensus).

Already addressed — the fallback loop uses a shared deadlineCtx (not per-org timeout). See 4e149b3.

waynesun09 · 2026-05-19T22:50:58Z


 concurrency:
-  group: e2e-halfsend
+  group: e2e-${{ github.run_id }}


HIGH — Concurrency group is effectively a no-op

github.run_id is unique per workflow run, so no two runs ever share the same concurrency group. Combined with removing queue: max, there is no limit on simultaneous e2e runs. If >6 PRs trigger e2e at once, excess runs will contend for the 6-org pool and likely timeout.

Either remove the concurrency block entirely to make the intent explicit, or use a meaningful shared group:

concurrency: group: e2e-pool cancel-in-progress: false

Flagged by 8/10 review agents.

Already addressed — the concurrency block was removed in an earlier commit.

waynesun09 · 2026-05-19T22:51:00Z

+	}
+	// Avoid generating a code near a period boundary where it could expire
+	// during the ~300ms PressSequentially typing delay plus network round-trip.
+	if rem := time.Now().Second() % 30; rem >= 26 {


MEDIUM — Boundary check uses wall-clock seconds instead of Unix epoch

time.Now().Second() returns 0–59 within the current minute, but TOTP periods are calculated from Unix epoch (Unix() / 30). time.Now().Second() % 30 is not equivalent to time.Now().Unix() % 30.

For example, if Unix time is 1716148226, Unix() % 30 == 16 (safe, mid-period), but Second() could return 26 (triggering an unnecessary 5s sleep). Conversely, it could miss an actual boundary.

Suggested change

if rem := time.Now().Second() % 30; rem >= 26 {

if rem := time.Now().Unix() % 30; rem >= 26 {

time.Sleep(time.Duration(30-rem+1) * time.Second)

I think this is a false positive. Second() returns Unix() % 60, and since 30 divides 60 evenly, Second() % 30 == Unix() % 30 always. Do you see it differently?

waynesun09 · 2026-05-19T22:51:02Z

+		}); err != nil {
+			return false, fmt.Errorf("waiting for post-TOTP navigation: %w", err)


MEDIUM — NetworkIdle wait can hang on GitHub pages

GitHub pages often have background WebSocket connections, analytics, or long-polling that prevent the network from becoming "idle". This can cause the wait to hang for the full default Playwright timeout (30s). Other navigation waits in this codebase use DOMContentLoaded.

Suggested change

}); err != nil {

return false, fmt.Errorf("waiting for post-TOTP navigation: %w", err)

if err := page.WaitForLoadState(playwright.PageWaitForLoadStateOptions{

State: playwright.LoadStateDomcontentloaded,

Flagged by 7/10 review agents.

Already addressed — switched to LoadStateDomcontentloaded in 8ab00a0.

waynesun09 · 2026-05-19T22:51:03Z

+		}
+	}


MEDIUM — Retry generates the same TOTP code

A 2s sleep stays within the same 30s TOTP period, so GenerateCode produces the identical code. If GitHub rejected the first code, the identical retry will also be rejected.

Wait for the TOTP period to actually roll over before retrying:

nextPeriod := (time.Now().Unix()/30 + 1) * 30 time.Sleep(time.Until(time.Unix(nextPeriod, 0)) + 1*time.Second)

Already addressed — retry now waits for the next TOTP period boundary instead of a fixed 2s sleep. See 8ab00a0.

waynesun09 · 2026-05-19T22:51:05Z

+// acquireOrg scans the org pool for an unlocked org and acquires its lock.
+// If all orgs are locked, it falls back to waiting on the first org in the
+// pool (with the standard lock timeout). Returns the org name.
+func acquireOrg(ctx context.Context, client forge.Client, token, runID string, timeout time.Duration, logf func(string, ...any)) (string, error) {


MEDIUM — Fixed iteration order creates thundering herd

Both the first-pass scan and fallback loop iterate orgPool in fixed order, so all concurrent runners compete for halfsend-01 first. This creates uneven wear and reduces effective parallelism.

Shuffle the pool at the start of acquireOrg:

shuffled := slices.Clone(orgPool) rand.Shuffle(len(shuffled), func(i, j int) { shuffled[i], shuffled[j] = shuffled[j], shuffled[i] })

Fixed in 4e2968e — org pool is now shuffled before scanning.

waynesun09 · 2026-05-19T22:51:06Z

 	"github.com/stretchr/testify/require"
 )

+const testLockOrg = "halfsend-test"


MEDIUM — No unit tests for acquireOrg

The core new function acquireOrg has no unit tests. Only acquireLock/releaseLock are tested. Key untested scenarios:

First org free → should acquire it

First N orgs locked, N+1 free → should skip locked and acquire free one

All orgs locked + timeout → should return error within expected time

tryCreateLock returns error vs "locked" → error propagation

These can be tested with forge.NewFakeClient() by pre-populating lock repos for specific orgs.

Added TestAcquireOrg_SkipsLockedOrg in 4e2968e. There were already 3 acquireOrg tests; that commit covers the missing "first org locked, second free" case.

Replace the single hardcoded `halfsend` test org with a pool of 6 orgs (halfsend-01 through halfsend-06). Each e2e run scans the pool for an unlocked org and claims it, only falling back to waiting if all orgs are locked. This allows up to 6 concurrent e2e runs. - Add acquireOrg() that does a non-blocking scan before waiting - Thread acquired org through setupE2ETest, cleanup, and browser helpers - Change workflow concurrency group from global to per-run Closes #1047 Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

Record the decision to use a pool of GitHub orgs with repo-based distributed locking for parallel e2e test execution. Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

- Bound fallback wait with a shared context deadline so total wait is capped at timeout, not N * timeout across the pool - Distinguish "already exists" from real errors in tryCreateLock so rate limits and auth failures are not silently swallowed - Add unit tests for acquireOrg: happy path, all-locked timeout, and error propagation - Fix missing space after comma in cleanup.go (4 occurrences) - Fix misleading comment on acquireOrg fallback behavior Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

- Renumber ADR 0039 → 0040 to avoid collision with TOTP ADR - Fix ADR heading to match file number (39 → 40) - Remove no-op concurrency group from e2e workflow (org pool handles isolation; per-run-id group provided zero serialization) - Wrap last error in acquireOrg failure message for better diagnostics - Fix misleading "exponential" backoff comment (it's linear) Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

Replace the fake e2eDispatcher with a sharedMintDispatcher that points at the real public mint Cloud Function. Switch from org-scoped app creation to installing the shared fullsend-ai-* apps (fullsend, triage, coder, review). Key changes: - Add E2E_MINT_URL env var (required) for the shared mint URL - Replace e2eDispatcher with sharedMintDispatcher (real OIDC dispatch) - Use fullsend-ai app set with WithPublicApps(true) - Add select_target page handler for multi-org GitHub App installs - Add tryReclaimStaleLock in first pass of acquireOrg - Simplify cleanup to never delete shared apps - Make enrollment PR checks soft (OIDC mode may skip PR creation) - Gate triage smoke and unenrollment phases on enrollment success - Add E2E_MINT_URL to CI workflow Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

Replace the e2e test's programmatic layer-stack installation with subprocess invocations of 'fullsend admin install' and 'fullsend admin uninstall', testing the CLI as users actually call it. Key changes: - Phase 1 runs 'fullsend admin install' with --skip-app-setup, --vendor-fullsend-binary, --enroll-all - Phases 2-4 (enrollment, triage, unenrollment) are unconditional - Phase 5 runs 'fullsend admin uninstall --yolo' - Cleanup failures (PAT deletion, lock release) are hard test errors - Delete browser.go, prompter.go, sharedMintDispatcher, and all programmatic install/uninstall/verify functions Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

The triage smoke test alone can take 12+ minutes, plus install, enrollment, unenrollment, and uninstall phases. Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

The other halfsend orgs are not yet enrolled in the shared mint. Uncomment them once they are registered. Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

The --vendor-fullsend-binary flag runs 'go build ./cmd/fullsend/' which needs to be relative to the repo root, not the test package directory where go test sets cwd. Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

With --skip-app-setup --skip-mint-check, the CLI doesn't discover or create apps, so config.yaml has no agent entries. The roles are still populated in defaults.roles. Assert on those instead. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

The default agent roles now include retro and prioritize (6 total). Update the test constant to match config.DefaultAgentRoles(). Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

Idempotent script that provisions a halfsend-NN org for e2e testing: - Checks org exists (instructs user to create if not) - Verifies botsend is an owner (invites if not) - Creates test-repo if missing - Checks each fullsend-ai app is installed, opens browser for missing ones and waits for user to click through Usage: hack/setup-new-e2e-org NN Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

The e2e workflow only triggered on *.go, go.mod, go.sum, and e2e/** changes. This missed go:embed'd files (scaffold templates, security hooks, mint source, tokenizer data) and the Makefile — all of which change compiled binary behavior exercised by the e2e tests. Also enables halfsend-02 in the org pool for parallel e2e runs. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

With two orgs in the pool and e2e runs taking ~15 minutes, a 2-minute timeout was too short to wait for a locked org to become available. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

Add google-github-actions/auth step to authenticate to the dedicated e2e GCP project (it-gcp-konflux-e2e-fullsend) using WIF. This provides Application Default Credentials so the CLI can auto-provision WIF infrastructure for inference during the install flow. The E2E_GCP_WIF_PROVIDER secret needs to be set once the GCP project is provisioned and the CI WIF pool/provider is created. Format: projects/{NUMBER}/locations/global/workloadIdentityPools/{pool}/providers/{provider} Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

Replace the hardcoded service account identifier with a secrets.E2E_GCP_SERVICE_ACCOUNT reference, per CLAUDE.md policy against committing GCP project names and service account identifiers. Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

- Shuffle org pool before scanning to avoid thundering herd when concurrent runners all compete for the first org. - Add TestAcquireOrg_SkipsLockedOrg to cover the "first org locked, second free" scenario. - Fix FakeClient.CreateRepo dedup to match by full name (org/repo), not bare repo name, so same-named repos in different orgs don't collide. Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

When the triage dispatch smoke test fails, save workflow logs and download all run artifacts (transcripts, screenshots) to the e2e screenshot directory. Emit ::notice:: annotations so artifacts are easy to find in the GitHub Actions UI. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

fullsend-ai-review

See the review comment for full details.

fullsend-ai-review · 2026-05-21T20:50:41Z

+  gh api "/orgs/${ORG}/repos" \
+    -f name="test-repo" \
+    -f description="E2E test repo" \
+    -F private=false \


[high] sensitive-data

Hardcoded GCP project name it-gcp-konflux-dev-fullsend as a default value. CLAUDE.md explicitly prohibits committing GCP project names: "Never commit secrets or sensitive data (GCP project names, ...). Use environment variables with no defaults for sensitive values."

Suggested fix: Remove the default and require MINT_PROJECT to be set via env var: MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}"

fullsend-ai-review · 2026-05-21T20:50:41Z

+		// each org only uses its fair share of the shared deadline.
+		err := acquireLock(deadlineCtx, client, token, org, runID, timeout, logf)
+		if err == nil {
+			return org, nil


[low] misleading-comment

Comment says "Pass remaining context time, not the full timeout" but the code passes the full timeout value to acquireLock. The bounding comes from deadlineCtx, not the timeout parameter.

Suggested fix: Update the comment to clarify that the shared deadlineCtx bounds total wait, not the timeout parameter.

fullsend-ai-review · 2026-05-21T20:50:41Z

+	if err == nil {
+		return false
+	}
+	msg := err.Error()


[low] fragile-error-matching

isRepoAlreadyExists matches on the string "422" in the error message. HTTP 422 can indicate many validation errors besides "already exists", which could mask unexpected failures.

Suggested fix: Prefer checking for "already exists" only, or implement typed error checking from the GitHub client.

ralphbean · 2026-05-21T21:02:29Z

The e2e triage smoke test is failing because claude-opus-4-6 isn't enabled on the Vertex AI deployment in the e2e GCP project (it-gcp-konflux-e2e-fullsend). All three transcript files from the failed run show the same error:

The model claude-opus-4-6 is not available on your vertex deployment.

The artifact download feature (74963c0) worked — we got the transcripts, logs, and sandbox traces, which is how we diagnosed this.

Waiting on a ticket to get the right models enabled in the e2e GCP account.

waynesun09

Review Squad Report (Round 3) — 10-agent review

Agents: 2x claude-coder, 2x claude-researcher, 4x gemini-code-review, 2x cursor-code-review
Findings: 5 MEDIUM, 7 LOW (after dedup + verification against actual code)
False positives removed: 6

Prior review status

✅ Both HIGHs fixed (round 2)
✅ Thundering herd — fixed (rand.Shuffle)
✅ acquireOrg tests — added (4 test functions)
ℹ️ 3 TOTP MEDIUMs (Second vs Unix, NetworkIdle, retry) — not in this PR's diff, files unchanged from main (merged via #1211)

New findings (this review)

5 MEDIUM: fallback timeout starvation, isRepoAlreadyExists 422 false-match, stale lock threshold, hardcoded GCP project, mutable orgPool var
7 LOW: timeout budget, CLI logging, cleanup t.Errorf, artifact download timeout, build caching, workflow concurrency

False positives removed

"Corrupted GH Action references" (2 gemini agents) — hallucinated worktree paths as file content
"Zip slip bypass via sibling dir" — trailing os.PathSeparator in HasPrefix prevents this
"Lock reclaim TOCTOU race" — GitHub atomic CreateRepo + verify-read-back prevents this
"Starvation spin on empty lock repos" — code handles NotFound correctly with poll interval

waynesun09 · 2026-05-21T21:21:57Z

+		}
+		// Pass remaining context time, not the full timeout, so that
+		// each org only uses its fair share of the shared deadline.
+		err := acquireLock(deadlineCtx, client, token, org, runID, timeout, logf)


[MEDIUM] Fallback loop passes full timeout, starving later orgs

The comment on line 95-96 says "pass remaining context time" but the code passes the original timeout. Since acquireLock uses timeout for its internal deadline and the stale-lock age threshold, the first org in the fallback sequence can consume the entire 20-minute budget before org 2 is ever tried. If org-2 frees up during the wait, the runner misses it.

// suggestion: pass remaining time from the deadline context remaining := time.Until(deadline) // extract from deadlineCtx err := acquireLock(deadlineCtx, client, token, org, runID, remaining, logf)

Or refactor to round-robin poll all orgs per interval instead of blocking on one at a time.

Flagged by 7/9 review agents (consensus)

waynesun09 · 2026-05-21T21:21:59Z

+		return false
+	}
+	msg := err.Error()
+	return strings.Contains(msg, "already exists") || strings.Contains(msg, "422")


[MEDIUM] isRepoAlreadyExists matches on "422" substring — overly broad

HTTP 422 from GitHub can indicate many validation errors beyond "already exists" (invalid name, reserved name, auth/permission issues). A non-duplicate 422 error would be silently swallowed, making tryCreateLock return (false, nil) when the API call actually failed — causing a 20-minute wait instead of fast failure.

The "already exists" check alone is sufficient for both the real GitHub API (which returns "name already exists on this account") and the fake client (which returns "repository already exists: ...").

func isRepoAlreadyExists(err error) bool { if err == nil { return false } return strings.Contains(err.Error(), "already exists") }

Flagged by 6/9 review agents (strongest consensus)

waynesun09 · 2026-05-21T21:22:00Z

+		return false
+	}
+	age := time.Since(createdAt)
+	if age <= timeout {


[MEDIUM] Stale lock reclaim threshold tied to full 20-minute timeout

tryReclaimStaleLock considers a lock stale when age > timeout, where timeout is now defaultLockTimeout = 20 * time.Minute. A crashed run's lock won't be reclaimed for 20 minutes even though tests complete in ~15 minutes.

The staleness threshold should be independent of the wait timeout — e.g., a separate staleLockTimeout constant of 5 minutes would reclaim crashed-run locks faster without affecting how long we're willing to wait for busy orgs.

const staleLockTimeout = 5 * time.Minute // in tryReclaimStaleLock: if age <= staleLockTimeout {

Flagged by 2/9 review agents

waynesun09 · 2026-05-21T21:22:02Z

+done
+
+# --- 5. check mint enrollment ---
+MINT_PROJECT="${MINT_PROJECT:-it-gcp-konflux-dev-fullsend}"


[MEDIUM] Hardcoded GCP project name in public repo

MINT_PROJECT, MINT_FUNCTION defaults expose internal cloud infrastructure naming in commit history of a public repository.

MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}" MINT_REGION="${MINT_REGION:-us-central1}" MINT_FUNCTION="${MINT_FUNCTION:?MINT_FUNCTION must be set}"

Flagged by 3/9 review agents

waynesun09 · 2026-05-21T21:22:03Z


+// orgPool is the set of GitHub orgs available for parallel e2e test runs.
+// Each run acquires a lock on one org before proceeding.
+var orgPool = []string{


[MEDIUM] orgPool is a mutable package-level var

Tests in lock_test.go mutate orgPool directly with save/restore via defer. This is not safe with t.Parallel() and is a maintenance hazard.

Consider passing the pool as a parameter to acquireOrg:

func acquireOrg(ctx context.Context, client forge.Client, token, runID string, pool []string, timeout time.Duration, logf func(string, ...any)) (string, error) {

Flagged by 2/9 review agents

waynesun09 · 2026-05-21T21:22:41Z

 		t.Log("Deleting PAT...")
 		if delErr := deletePAT(page, patNote, t.Logf); delErr != nil {
-			t.Logf("warning: could not delete PAT: %v", delErr)
+			t.Errorf("could not delete PAT: %v", delErr)


[LOW] PAT deletion t.Errorf can fail otherwise-passing tests

Same pattern as lock release — PAT deletion is a cleanup operation that runs in t.Cleanup. Transient API failures here would mark the test as failed even when all assertions passed. The existing check on line 95 already handles "not found on page" gracefully.

Flagged by 4/9 review agents

waynesun09 · 2026-05-21T21:22:42Z

+	}
+	req.Header.Set("Authorization", "Bearer "+token)
+
+	resp, err := http.DefaultClient.Do(req)


[LOW] Artifact download uses http.DefaultClient without explicit timeout

http.DefaultClient has no timeout. The request context provides some protection, but body reads aren't fully bounded by context cancellation. The 50MB LimitReader on line 508 is good, but a hung connection could block indefinitely.

httpClient := &http.Client{Timeout: 60 * time.Second}

Flagged by 2/9 review agents

waynesun09 · 2026-05-21T21:22:44Z

-func (d *e2eDispatcher) Provision(_ context.Context) (map[string]string, error) {
-	return map[string]string{"FULLSEND_MINT_URL": "https://e2e-test.example.com/mint"}, nil
+// buildCLIBinary compiles the fullsend CLI binary once per test run.
+func buildCLIBinary(t *testing.T) string {


[LOW] buildCLIBinary and runCLI have redundant work per call

buildCLIBinary uses t.TempDir() (new dir per test function) and runCLI calls go list -m on every invocation (line 233). Currently there are only 2 test functions and 3-4 runCLI calls, so the overhead is minor. Consider sync.Once or caching in a package-level var if more tests are added.

Flagged by 5/9 review agents

waynesun09 · 2026-05-21T21:22:45Z

  e2e:
    runs-on: ubuntu-latest
-    timeout-minutes: 15
+    timeout-minutes: 30


[LOW] No workflow concurrency control after removing the global group

With only 2 active orgs, 3+ concurrent workflow runs will waste CI resources waiting for locks (up to 20 min each). Consider adding a concurrency group matching the pool size:

concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true

This cancels superseded commits on the same branch/PR while still allowing parallel runs across different PRs.

Flagged by 4/9 review agents

waynesun09 · 2026-05-21T21:23:08Z


 	if err := client.DeleteRepo(ctx, org, lockRepo); err != nil {
-		t.Logf("[e2e-lock] Failed to release lock: %v", err)
+		t.Errorf("[e2e-lock] failed to release lock: %v", err)


[LOW] Lock release t.Errorf can fail otherwise-passing tests

Changed from t.Logf to t.Errorf here and on line 158. Transient GitHub API failures during cleanup (network blips, rate limits) would now fail an otherwise-passing test. cleanup.go uses t.Logf for its operations — consider keeping lock release consistent, or at minimum distinguish "operation failed" (error) from "resource already cleaned up" (warning).

Flagged by 4/9 review agents

github-actions Bot deployed to site-preview May 19, 2026 20:55 View deployment

fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label May 19, 2026

waynesun09 requested changes May 19, 2026

View reviewed changes

ralphbean force-pushed the feat/e2e-parallel-orgs branch from 1b632db to fe94ddf Compare May 20, 2026 17:48

fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed requires-manual-review Review requires human judgment labels May 20, 2026

ralphbean added 18 commits May 21, 2026 16:13

docs: add ADR 0039 for org pool parallel e2e tests

2e20e80

Record the decision to use a pool of GitHub orgs with repo-based distributed locking for parallel e2e test execution. Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

ci(e2e): add E2E_GCP_PROJECT_ID secret to workflow

806eca8

Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

fix(e2e): update defaultRoles to match current defaults

d6a44ee

The default agent roles now include retro and prioritize (6 total). Update the test constant to match config.DefaultAgentRoles(). Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

fix(e2e): increase default lock timeout to 20 minutes

ea62202

With two orgs in the pool and e2e runs taking ~15 minutes, a 2-minute timeout was too short to wait for a locked org to become available. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>

ralphbean force-pushed the feat/e2e-parallel-orgs branch from f5d1267 to 4e2968e Compare May 21, 2026 20:31

github-actions Bot deployed to site-preview May 21, 2026 20:33 View deployment

fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed requires-manual-review Review requires human judgment labels May 21, 2026

github-actions Bot deployed to site-preview May 21, 2026 20:46 View deployment

fullsend-ai-review Bot suggested changes May 21, 2026

View reviewed changes

fullsend-ai-review Bot removed the requires-manual-review Review requires human judgment label May 21, 2026

waynesun09 reviewed May 21, 2026

View reviewed changes


		// All orgs are locked. Fall back to waiting on the first available.
		logf("[org-pool] All %d orgs are locked, waiting with timeout %s", len(orgPool), timeout)

	if rem := time.Now().Second() % 30; rem >= 26 {
	if rem := time.Now().Unix() % 30; rem >= 26 {
	time.Sleep(time.Duration(30-rem+1) * time.Second)

		}); err != nil {
		return false, fmt.Errorf("waiting for post-TOTP navigation: %w", err)

Conversation

ralphbean commented May 19, 2026

Summary

Dependencies

Test plan

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Site preview

Uh oh!

fullsend-ai-review Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Findings

High

Medium

Low

Review

Findings

Medium

Low

Info

Review

Findings

High

Medium

Low

Review

Findings

Medium

Low

Review

Findings

Medium

Low

Uh oh!

waynesun09 left a comment

Choose a reason for hiding this comment

Review Squad Report — 10-agent review

Inline comments posted (2 HIGH, 5 MEDIUM)

Deferred hardening (LOW — not blocking)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fullsend-ai-review Bot left a comment

Choose a reason for hiding this comment

Uh oh!

fullsend-ai-review Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

fullsend-ai-review Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

fullsend-ai-review Bot May 21, 2026

github-actions Bot commented May 19, 2026 •

edited

Loading

fullsend-ai-review Bot commented May 19, 2026 •

edited

Loading