Skip to content

feat(e2e): parallel test runs via org pool#1215

Open
ralphbean wants to merge 19 commits into
mainfrom
feat/e2e-parallel-orgs
Open

feat(e2e): parallel test runs via org pool#1215
ralphbean wants to merge 19 commits into
mainfrom
feat/e2e-parallel-orgs

Conversation

@ralphbean
Copy link
Copy Markdown
Contributor

Summary

  • Replace the single hardcoded halfsend test org with a pool of 6 orgs (halfsend-01 through halfsend-06)
  • Each e2e run scans the pool for an unlocked org and claims it, only waiting if all 6 are locked
  • Remove the global e2e-halfsend workflow concurrency group — contention is now handled by the org pool
  • Thread the acquired org through all test infrastructure (setupE2ETest, cleanup, browser helpers)

Dependencies

Depends on #1211 (TOTP support) — this branch includes those commits.

Test plan

  • go vet -tags e2e ./e2e/admin/... passes
  • Lock unit tests pass (TestAcquireLock, TestReleaseLock)
  • make lint passes
  • Full e2e run against one of the new orgs

Closes #1047

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Site preview

Preview: https://2be0b1cc-site.fullsend-ai.workers.dev

Commit: 74963c075c05bb570bc952082ff0166c0c1cb596

@fullsend-ai-review
Copy link
Copy Markdown

fullsend-ai-review Bot commented May 19, 2026

Review

Findings

High

  • [sensitive-data] hack/setup-new-e2e-org.sh:138 — Hardcoded GCP project name it-gcp-konflux-dev-fullsend as a default value. CLAUDE.md explicitly prohibits committing GCP project names: "Never commit secrets or sensitive data (GCP project names, ...). Use environment variables with no defaults for sensitive values."
    Remediation: Remove the default and require MINT_PROJECT to be set via env var: MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}"

Medium

  • [protected-path] .github/workflows/e2e.yml — This PR modifies a file under .github/, which is a protected path requiring human approval regardless of review outcome. Changes include: adding id-token: write permission, GCP WIF authentication step, removing the concurrency group, increasing timeout to 30m, and broadening trigger paths.

  • [docs-currency] CLAUDE.md — The PR makes E2E_MINT_URL a required env var for e2e tests (the test skips without it), and adds E2E_GCP_PROJECT_ID as an optional env var. Neither is documented in the "Running e2e tests" section of CLAUDE.md, which currently lists only E2E_GITHUB_PASSWORD, E2E_GITHUB_PASSWORD_FILE, E2E_GITHUB_SESSION_FILE, and E2E_GITHUB_TOTP_SECRET.
    Remediation: Add E2E_MINT_URL (required) and E2E_GCP_PROJECT_ID (optional) to the e2e env var list in CLAUDE.md.

Low

  • [misleading-comment] e2e/admin/testutil.go:100 — Comment says "Pass remaining context time, not the full timeout" but the code passes the full timeout value to acquireLock. The bounding actually comes from deadlineCtx, not the timeout parameter. The code works correctly but the comment is misleading.
    Remediation: Update the comment to: "The shared deadlineCtx bounds total wait; the timeout parameter controls acquireLock's internal polling but the context expires first."

  • [fragile-error-matching] e2e/admin/lock.go:220isRepoAlreadyExists matches on the string "422" in the error message. HTTP 422 can indicate many validation errors besides "already exists" (e.g., invalid repo name). While acceptable for e2e infra, this could mask unexpected failures.
    Remediation: Prefer checking for "already exists" only, or implement typed error checking from the GitHub client if available.

Previous run

Review

Findings

Medium

  • [protected-path] .github/workflows/e2e.yml — This PR modifies a protected path (.github/). The PR links to e2e: increase throughput with parallel-safe test isolation #1047 and explains the rationale (removing concurrency group, adding GCP WIF auth, expanding path triggers, increasing timeout). Human approval is always required for protected-path changes, regardless of context.

  • [correctness] e2e/admin/admin_test.go — The refactor removes significant test coverage that previously existed: (1) idempotency testing (second install was verified as no-op, second uninstall accepted not-found errors), (2) app setup via manifest flow (entire browser.go deleted, --skip-app-setup used instead), (3) detailed verification of per-role secrets/variables and analyze-layer-status assertions. The test now exercises the CLI subprocess path which is valuable, but the coverage reduction is notable. Consider whether idempotency testing should be preserved in a separate test or tracked as follow-up work.

  • [correctness] e2e/admin/testutil.go:70 — The acquireOrg fallback loop comment says "Pass remaining context time, not the full timeout" but actually passes timeout (the full value) to acquireLock. The code still works correctly because deadlineCtx constrains the actual execution time, but the timeout parameter controls staleness checks inside acquireLock, which means each org gets the full staleness threshold rather than a proportional share. With 2 orgs this is fine, but at 6 orgs a stale lock on org 1 could consume most of the shared deadline before org 2 is tried.

Low

  • [correctness] e2e/admin/lock.goisRepoAlreadyExists checks strings.Contains(msg, "422") which is fragile. GitHub returns HTTP 422 for various validation errors, not only duplicate repos. A more precise check (e.g., checking for the specific GitHub error message "name already exists on this account") would be more robust. In practice this is unlikely to cause issues since the lock repo name is fixed.

  • [docs-currency] CLAUDE.md:29-40 — The "Running e2e tests" section does not document the new required E2E_MINT_URL env var (which now causes test skip if unset) or the optional E2E_GCP_PROJECT_ID. The description also still describes tests as using "Playwright browser automation" for the install/uninstall flow, but the test now uses CLI subprocess (runCLI) for install/uninstall while Playwright is only used for PAT management and session handling.

Info

  • [correctness] e2e/admin/testutil.go — Only 2 of 6 pool orgs (halfsend-01, halfsend-02) are uncommented. The remaining 4 are commented out as "not yet enrolled in mint." Effective parallelism is 2, not 6 as the PR title and ADR might suggest. The hack/setup-new-e2e-org.sh script provides a clear path to expand the pool.
Previous run (2)

Review

Findings

High

  • [platform-security] hack/setup-new-e2e-org.sh:182 — Hardcoded GCP project name it-gcp-konflux-dev-fullsend as a default value for MINT_PROJECT. CLAUDE.md prohibits committing GCP project names and requires environment variables with no defaults for sensitive values.
    Remediation: Remove the default so the variable is required: MINT_PROJECT="${MINT_PROJECT:?Set MINT_PROJECT to the GCP project ID}".

Medium

  • [protected-path] .github/workflows/e2e.yml — This PR modifies a protected path (.github/). The PR links to issue e2e: increase throughput with parallel-safe test isolation #1047 and explains the rationale (removing the serializing concurrency group, adding GCP auth and new env vars for the org pool). Human approval is always required for protected-path changes, regardless of context.

  • [correctness] e2e/admin/lock.go:219isRepoAlreadyExists checks strings.Contains(msg, "422") which is overly broad. A 422 from GitHub can indicate many validation failures beyond "repo already exists" (e.g., invalid name, name too long, name reserved). This could cause non-lock errors to be silently swallowed, masking real failures during lock acquisition.
    Remediation: Check for a more specific substring like "name already exists" or parse the GitHub API error response structure to inspect the error code/message field.

Low

  • [correctness] e2e/admin/testutil.go:47defaultLockTimeout increased from 2 minutes to 20 minutes. While the fallback only triggers when all orgs are locked, a 20-minute hang with no progress indication could waste CI runner minutes if there is a systemic issue (e.g., all orgs permanently locked due to an infrastructure problem). Consider a shorter timeout or more frequent status logging during the wait.
Previous run (3)

Review

Findings

Medium

  • [protected-path] .github/workflows/e2e.yml — This PR modifies a file under .github/, which is a protected path requiring human approval. The change removes the e2e-halfsend concurrency group (replaced by org-pool contention handling) and adds E2E_GITHUB_TOTP_SECRET and E2E_MINT_URL secret references. The PR links to e2e: increase throughput with parallel-safe test isolation #1047 and the rationale is well-documented, but human approval is always required for protected-path changes regardless of context.

Low

  • [correctness] e2e/admin/testutil.go:82 — In the acquireOrg fallback loop, the comment says "Pass remaining context time, not the full timeout, so that each org only uses its fair share of the shared deadline" but the code passes timeout (the full original value) to acquireLock. The shared deadlineCtx does bound total wait time correctly, but the first org in the fallback sequence can consume the entire deadline, leaving nothing for subsequent orgs. The comment is misleading about fair-share behavior. Consider either dividing the remaining time across orgs or updating the comment to reflect that the context deadline is the only bound.
    Remediation: Update the comment to accurately describe the bounding behavior, or pass time.Until(deadline) / time.Duration(len(orgPool)-i) as the per-org timeout.

  • [correctness] e2e/admin/lock.go:125isRepoAlreadyExists checks for the substring "422" in error messages, which could match non-"already exists" 422 errors (e.g., validation errors on other fields). In practice this is low-risk since the function is only called after CreateRepo, but the check is broader than intended.
    Remediation: Consider checking for a more specific substring like "name already exists" if the GitHub API error message is stable enough.

Previous run (4)

Review

Findings

Medium

  • [protected-path] .github/workflows/e2e.yml — This PR modifies a protected path (.github/). The PR links to e2e: increase throughput with parallel-safe test isolation #1047 and the description explains the rationale (removing global concurrency group, adding TOTP secret). Human approval is required for protected-path changes regardless of context.

Low

  • [style] e2e/admin/cleanup.go:499,507,517,539 — Missing space after org parameter in multiple deleteAppViaPlaywright calls: page, org,slug should be page, org, slug. Four occurrences in cleanupStaleResources.
    Remediation: Add a space after the comma: deleteAppViaPlaywright(page, org, slug, ...)

  • [docs-currency] CLAUDE.md:33-37 — The "Running e2e tests" section documents E2E_GITHUB_PASSWORD, E2E_GITHUB_PASSWORD_FILE, and E2E_GITHUB_SESSION_FILE env vars. This PR adds E2E_GITHUB_TOTP_SECRET as a new optional env var for 2FA accounts, but CLAUDE.md is not updated to document it.
    Remediation: Add E2E_GITHUB_TOTP_SECRET to the env var list in the "Running e2e tests" section of CLAUDE.md.

  • [correctness] e2e/admin/testutil.go — The acquireOrg fallback loop iterates all orgs sequentially with the full timeout per org, so worst-case wait is len(orgPool) × timeout (12 minutes with defaults). The comment "only waiting if all 6 are locked" is accurate but doesn't convey the total worst-case duration. Consider adding a log line showing total maximum wait or applying a global deadline across the fallback loop.
    Remediation: Wrap the fallback loop with a single deadline (time.Now().Add(timeout)) and pass remaining time to each acquireLock call, or document the multiplicative behavior.

@fullsend-ai-review fullsend-ai-review Bot added the requires-manual-review Review requires human judgment label May 19, 2026
Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Squad Report — 10-agent review

Agents: 3× Claude Coder, 3× Claude Researcher, 2× Gemini, 2× Cursor
Findings: 14 after dedup, 4 false positives removed

Inline comments posted (2 HIGH, 5 MEDIUM)

Sev Finding File Consensus
HIGH Fallback loop waits N × timeout testutil.go:69 10/10 agents
HIGH Concurrency group is a no-op e2e.yml:15 8/10 agents
MEDIUM Second() % 30 vs Unix() % 30 boundary bug otp.go:21 1 agent (verified)
MEDIUM NetworkIdle wait can hang page.go:54 7/10 agents
MEDIUM TOTP retry generates same code page.go:70 2 agents
MEDIUM Fixed-order thundering herd testutil.go:52 2 agents
MEDIUM No acquireOrg unit tests lock_test.go:15 5 agents

Deferred hardening (LOW — not blocking)

  • Missing space after comma in 4× deleteAppViaPlaywright calls (cleanup.go:51,58,66,84) — run gofmt
  • E2E_HALFSEND_WIF_PROVIDER env var still references old org name — rename in follow-up
  • handleSudoIfPresent fails when password field visible but empty, without trying TOTP fallback
  • waitForPageToLeave uses busy-polling instead of Playwright native waits
  • orgPool is a mutable package-level var

Comment thread e2e/admin/testutil.go Outdated
Comment on lines +67 to +69

// All orgs are locked. Fall back to waiting on the first available.
logf("[org-pool] All %d orgs are locked, waiting with timeout %s", len(orgPool), timeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — Fallback loop waits N × timeout instead of timeout

Each acquireLock call blocks for the full timeout (default 2 min). With 6 orgs, worst case = 6 × 2 min = 12 min — nearly exhausting the 15-min job timeout. The comment says "waiting with timeout" (singular) but the actual behavior multiplies it by pool size.

Consider using a single shared deadline across the fallback:

deadline := time.Now().Add(timeout)
for time.Now().Before(deadline) {
    for _, org := range orgPool {
        remaining := time.Until(deadline)
        if remaining <= 0 {
            break
        }
        err := acquireLock(ctx, client, token, org, runID, min(remaining, lockPollInterval*2), logf)
        if err == nil {
            return org, nil
        }
    }
}

Flagged by 10/10 review agents (unanimous consensus).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — the fallback loop uses a shared deadlineCtx (not per-org timeout). See 4e149b3.

Comment thread .github/workflows/e2e.yml Outdated

concurrency:
group: e2e-halfsend
group: e2e-${{ github.run_id }}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH — Concurrency group is effectively a no-op

github.run_id is unique per workflow run, so no two runs ever share the same concurrency group. Combined with removing queue: max, there is no limit on simultaneous e2e runs. If >6 PRs trigger e2e at once, excess runs will contend for the 6-org pool and likely timeout.

Either remove the concurrency block entirely to make the intent explicit, or use a meaningful shared group:

concurrency:
  group: e2e-pool
  cancel-in-progress: false

Flagged by 8/10 review agents.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — the concurrency block was removed in an earlier commit.

Comment thread e2e/internal/otp/otp.go
}
// Avoid generating a code near a period boundary where it could expire
// during the ~300ms PressSequentially typing delay plus network round-trip.
if rem := time.Now().Second() % 30; rem >= 26 {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — Boundary check uses wall-clock seconds instead of Unix epoch

time.Now().Second() returns 0–59 within the current minute, but TOTP periods are calculated from Unix epoch (Unix() / 30). time.Now().Second() % 30 is not equivalent to time.Now().Unix() % 30.

For example, if Unix time is 1716148226, Unix() % 30 == 16 (safe, mid-period), but Second() could return 26 (triggering an unnecessary 5s sleep). Conversely, it could miss an actual boundary.

Suggested change
if rem := time.Now().Second() % 30; rem >= 26 {
if rem := time.Now().Unix() % 30; rem >= 26 {
time.Sleep(time.Duration(30-rem+1) * time.Second)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a false positive. Second() returns Unix() % 60, and since 30 divides 60 evenly, Second() % 30 == Unix() % 30 always. Do you see it differently?

Comment thread e2e/internal/otp/page.go
Comment on lines +53 to +54
}); err != nil {
return false, fmt.Errorf("waiting for post-TOTP navigation: %w", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — NetworkIdle wait can hang on GitHub pages

GitHub pages often have background WebSocket connections, analytics, or long-polling that prevent the network from becoming "idle". This can cause the wait to hang for the full default Playwright timeout (30s). Other navigation waits in this codebase use DOMContentLoaded.

Suggested change
}); err != nil {
return false, fmt.Errorf("waiting for post-TOTP navigation: %w", err)
if err := page.WaitForLoadState(playwright.PageWaitForLoadStateOptions{
State: playwright.LoadStateDomcontentloaded,

Flagged by 7/10 review agents.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — switched to LoadStateDomcontentloaded in 8ab00a0.

Comment thread e2e/internal/otp/page.go
Comment on lines +69 to +70
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — Retry generates the same TOTP code

A 2s sleep stays within the same 30s TOTP period, so GenerateCode produces the identical code. If GitHub rejected the first code, the identical retry will also be rejected.

Wait for the TOTP period to actually roll over before retrying:

nextPeriod := (time.Now().Unix()/30 + 1) * 30
time.Sleep(time.Until(time.Unix(nextPeriod, 0)) + 1*time.Second)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already addressed — retry now waits for the next TOTP period boundary instead of a fixed 2s sleep. See 8ab00a0.

Comment thread e2e/admin/testutil.go
// acquireOrg scans the org pool for an unlocked org and acquires its lock.
// If all orgs are locked, it falls back to waiting on the first org in the
// pool (with the standard lock timeout). Returns the org name.
func acquireOrg(ctx context.Context, client forge.Client, token, runID string, timeout time.Duration, logf func(string, ...any)) (string, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — Fixed iteration order creates thundering herd

Both the first-pass scan and fallback loop iterate orgPool in fixed order, so all concurrent runners compete for halfsend-01 first. This creates uneven wear and reduces effective parallelism.

Shuffle the pool at the start of acquireOrg:

shuffled := slices.Clone(orgPool)
rand.Shuffle(len(shuffled), func(i, j int) { shuffled[i], shuffled[j] = shuffled[j], shuffled[i] })

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 4e2968e — org pool is now shuffled before scanning.

Comment thread e2e/admin/lock_test.go
"github.com/stretchr/testify/require"
)

const testLockOrg = "halfsend-test"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MEDIUM — No unit tests for acquireOrg

The core new function acquireOrg has no unit tests. Only acquireLock/releaseLock are tested. Key untested scenarios:

  1. First org free → should acquire it
  2. First N orgs locked, N+1 free → should skip locked and acquire free one
  3. All orgs locked + timeout → should return error within expected time
  4. tryCreateLock returns error vs "locked" → error propagation

These can be tested with forge.NewFakeClient() by pre-populating lock repos for specific orgs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added TestAcquireOrg_SkipsLockedOrg in 4e2968e. There were already 3 acquireOrg tests; that commit covers the missing "first org locked, second free" case.

@ralphbean ralphbean force-pushed the feat/e2e-parallel-orgs branch from 1b632db to fe94ddf Compare May 20, 2026 17:48
@fullsend-ai-review fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed requires-manual-review Review requires human judgment labels May 20, 2026
ralphbean added 18 commits May 21, 2026 16:13
Replace the single hardcoded `halfsend` test org with a pool of 6 orgs
(halfsend-01 through halfsend-06). Each e2e run scans the pool for an
unlocked org and claims it, only falling back to waiting if all orgs are
locked. This allows up to 6 concurrent e2e runs.

- Add acquireOrg() that does a non-blocking scan before waiting
- Thread acquired org through setupE2ETest, cleanup, and browser helpers
- Change workflow concurrency group from global to per-run

Closes #1047

Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Record the decision to use a pool of GitHub orgs with repo-based
distributed locking for parallel e2e test execution.

Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
- Bound fallback wait with a shared context deadline so total wait
  is capped at timeout, not N * timeout across the pool
- Distinguish "already exists" from real errors in tryCreateLock so
  rate limits and auth failures are not silently swallowed
- Add unit tests for acquireOrg: happy path, all-locked timeout,
  and error propagation
- Fix missing space after comma in cleanup.go (4 occurrences)
- Fix misleading comment on acquireOrg fallback behavior

Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
- Renumber ADR 0039 → 0040 to avoid collision with TOTP ADR
- Fix ADR heading to match file number (39 → 40)
- Remove no-op concurrency group from e2e workflow (org pool
  handles isolation; per-run-id group provided zero serialization)
- Wrap last error in acquireOrg failure message for better diagnostics
- Fix misleading "exponential" backoff comment (it's linear)

Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Replace the fake e2eDispatcher with a sharedMintDispatcher that
points at the real public mint Cloud Function. Switch from
org-scoped app creation to installing the shared fullsend-ai-*
apps (fullsend, triage, coder, review).

Key changes:
- Add E2E_MINT_URL env var (required) for the shared mint URL
- Replace e2eDispatcher with sharedMintDispatcher (real OIDC dispatch)
- Use fullsend-ai app set with WithPublicApps(true)
- Add select_target page handler for multi-org GitHub App installs
- Add tryReclaimStaleLock in first pass of acquireOrg
- Simplify cleanup to never delete shared apps
- Make enrollment PR checks soft (OIDC mode may skip PR creation)
- Gate triage smoke and unenrollment phases on enrollment success
- Add E2E_MINT_URL to CI workflow

Signed-off-by: Ryan Bean <[email protected]>
Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Replace the e2e test's programmatic layer-stack installation with
subprocess invocations of 'fullsend admin install' and
'fullsend admin uninstall', testing the CLI as users actually call it.

Key changes:
- Phase 1 runs 'fullsend admin install' with --skip-app-setup,
  --vendor-fullsend-binary, --enroll-all
- Phases 2-4 (enrollment, triage, unenrollment) are unconditional
- Phase 5 runs 'fullsend admin uninstall --yolo'
- Cleanup failures (PAT deletion, lock release) are hard test errors
- Delete browser.go, prompter.go, sharedMintDispatcher, and all
  programmatic install/uninstall/verify functions

Signed-off-by: Ryan Bean <[email protected]>
Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Signed-off-by: Ryan Bean <[email protected]>
Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
The triage smoke test alone can take 12+ minutes, plus install,
enrollment, unenrollment, and uninstall phases.

Signed-off-by: Ryan Bean <[email protected]>
Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
The other halfsend orgs are not yet enrolled in the shared mint.
Uncomment them once they are registered.

Signed-off-by: Ryan Bean <[email protected]>
Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
The --vendor-fullsend-binary flag runs 'go build ./cmd/fullsend/'
which needs to be relative to the repo root, not the test package
directory where go test sets cwd.

Signed-off-by: Ryan Bean <[email protected]>
Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
With --skip-app-setup --skip-mint-check, the CLI doesn't discover or
create apps, so config.yaml has no agent entries. The roles are still
populated in defaults.roles. Assert on those instead.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
The default agent roles now include retro and prioritize (6 total).
Update the test constant to match config.DefaultAgentRoles().

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Idempotent script that provisions a halfsend-NN org for e2e testing:
- Checks org exists (instructs user to create if not)
- Verifies botsend is an owner (invites if not)
- Creates test-repo if missing
- Checks each fullsend-ai app is installed, opens browser for
  missing ones and waits for user to click through

Usage: hack/setup-new-e2e-org NN

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
The e2e workflow only triggered on *.go, go.mod, go.sum, and e2e/**
changes. This missed go:embed'd files (scaffold templates, security
hooks, mint source, tokenizer data) and the Makefile — all of which
change compiled binary behavior exercised by the e2e tests.

Also enables halfsend-02 in the org pool for parallel e2e runs.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
With two orgs in the pool and e2e runs taking ~15 minutes, a 2-minute
timeout was too short to wait for a locked org to become available.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Add google-github-actions/auth step to authenticate to the dedicated
e2e GCP project (it-gcp-konflux-e2e-fullsend) using WIF. This provides
Application Default Credentials so the CLI can auto-provision WIF
infrastructure for inference during the install flow.

The E2E_GCP_WIF_PROVIDER secret needs to be set once the GCP project
is provisioned and the CI WIF pool/provider is created. Format:
projects/{NUMBER}/locations/global/workloadIdentityPools/{pool}/providers/{provider}

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Replace the hardcoded service account identifier with a
secrets.E2E_GCP_SERVICE_ACCOUNT reference, per CLAUDE.md policy
against committing GCP project names and service account identifiers.

Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
- Shuffle org pool before scanning to avoid thundering herd when
  concurrent runners all compete for the first org.
- Add TestAcquireOrg_SkipsLockedOrg to cover the "first org locked,
  second free" scenario.
- Fix FakeClient.CreateRepo dedup to match by full name (org/repo),
  not bare repo name, so same-named repos in different orgs don't
  collide.

Assisted-by: Claude claude-opus-4-6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
@ralphbean ralphbean force-pushed the feat/e2e-parallel-orgs branch from f5d1267 to 4e2968e Compare May 21, 2026 20:31
@fullsend-ai-review fullsend-ai-review Bot added requires-manual-review Review requires human judgment and removed requires-manual-review Review requires human judgment labels May 21, 2026
When the triage dispatch smoke test fails, save workflow logs and
download all run artifacts (transcripts, screenshots) to the e2e
screenshot directory. Emit ::notice:: annotations so artifacts are
easy to find in the GitHub Actions UI.

Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Copy link
Copy Markdown

@fullsend-ai-review fullsend-ai-review Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the review comment for full details.

Comment thread hack/setup-new-e2e-org.sh
gh api "/orgs/${ORG}/repos" \
-f name="test-repo" \
-f description="E2E test repo" \
-F private=false \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[high] sensitive-data

Hardcoded GCP project name it-gcp-konflux-dev-fullsend as a default value. CLAUDE.md explicitly prohibits committing GCP project names: "Never commit secrets or sensitive data (GCP project names, ...). Use environment variables with no defaults for sensitive values."

Suggested fix: Remove the default and require MINT_PROJECT to be set via env var: MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}"

Comment thread e2e/admin/testutil.go
// each org only uses its fair share of the shared deadline.
err := acquireLock(deadlineCtx, client, token, org, runID, timeout, logf)
if err == nil {
return org, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] misleading-comment

Comment says "Pass remaining context time, not the full timeout" but the code passes the full timeout value to acquireLock. The bounding comes from deadlineCtx, not the timeout parameter.

Suggested fix: Update the comment to clarify that the shared deadlineCtx bounds total wait, not the timeout parameter.

Comment thread e2e/admin/lock.go
if err == nil {
return false
}
msg := err.Error()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low] fragile-error-matching

isRepoAlreadyExists matches on the string "422" in the error message. HTTP 422 can indicate many validation errors besides "already exists", which could mask unexpected failures.

Suggested fix: Prefer checking for "already exists" only, or implement typed error checking from the GitHub client.

@fullsend-ai-review fullsend-ai-review Bot removed the requires-manual-review Review requires human judgment label May 21, 2026
@ralphbean
Copy link
Copy Markdown
Contributor Author

The e2e triage smoke test is failing because claude-opus-4-6 isn't enabled on the Vertex AI deployment in the e2e GCP project (it-gcp-konflux-e2e-fullsend). All three transcript files from the failed run show the same error:

The model claude-opus-4-6 is not available on your vertex deployment.

The artifact download feature (74963c0) worked — we got the transcripts, logs, and sandbox traces, which is how we diagnosed this.

Waiting on a ticket to get the right models enabled in the e2e GCP account.

Copy link
Copy Markdown
Contributor

@waynesun09 waynesun09 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Squad Report (Round 3) — 10-agent review

Agents: 2x claude-coder, 2x claude-researcher, 4x gemini-code-review, 2x cursor-code-review
Findings: 5 MEDIUM, 7 LOW (after dedup + verification against actual code)
False positives removed: 6

Prior review status

  • ✅ Both HIGHs fixed (round 2)
  • ✅ Thundering herd — fixed (rand.Shuffle)
  • ✅ acquireOrg tests — added (4 test functions)
  • ℹ️ 3 TOTP MEDIUMs (Second vs Unix, NetworkIdle, retry) — not in this PR's diff, files unchanged from main (merged via #1211)

New findings (this review)

  • 5 MEDIUM: fallback timeout starvation, isRepoAlreadyExists 422 false-match, stale lock threshold, hardcoded GCP project, mutable orgPool var
  • 7 LOW: timeout budget, CLI logging, cleanup t.Errorf, artifact download timeout, build caching, workflow concurrency

False positives removed

  1. "Corrupted GH Action references" (2 gemini agents) — hallucinated worktree paths as file content
  2. "Zip slip bypass via sibling dir" — trailing os.PathSeparator in HasPrefix prevents this
  3. "Lock reclaim TOCTOU race" — GitHub atomic CreateRepo + verify-read-back prevents this
  4. "Starvation spin on empty lock repos" — code handles NotFound correctly with poll interval

Comment thread e2e/admin/testutil.go
}
// Pass remaining context time, not the full timeout, so that
// each org only uses its fair share of the shared deadline.
err := acquireLock(deadlineCtx, client, token, org, runID, timeout, logf)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Fallback loop passes full timeout, starving later orgs

The comment on line 95-96 says "pass remaining context time" but the code passes the original timeout. Since acquireLock uses timeout for its internal deadline and the stale-lock age threshold, the first org in the fallback sequence can consume the entire 20-minute budget before org 2 is ever tried. If org-2 frees up during the wait, the runner misses it.

// suggestion: pass remaining time from the deadline context
remaining := time.Until(deadline) // extract from deadlineCtx
err := acquireLock(deadlineCtx, client, token, org, runID, remaining, logf)

Or refactor to round-robin poll all orgs per interval instead of blocking on one at a time.

Flagged by 7/9 review agents (consensus)

Comment thread e2e/admin/lock.go
return false
}
msg := err.Error()
return strings.Contains(msg, "already exists") || strings.Contains(msg, "422")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] isRepoAlreadyExists matches on "422" substring — overly broad

HTTP 422 from GitHub can indicate many validation errors beyond "already exists" (invalid name, reserved name, auth/permission issues). A non-duplicate 422 error would be silently swallowed, making tryCreateLock return (false, nil) when the API call actually failed — causing a 20-minute wait instead of fast failure.

The "already exists" check alone is sufficient for both the real GitHub API (which returns "name already exists on this account") and the fake client (which returns "repository already exists: ...").

func isRepoAlreadyExists(err error) bool {
	if err == nil {
		return false
	}
	return strings.Contains(err.Error(), "already exists")
}

Flagged by 6/9 review agents (strongest consensus)

Comment thread e2e/admin/lock.go
return false
}
age := time.Since(createdAt)
if age <= timeout {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Stale lock reclaim threshold tied to full 20-minute timeout

tryReclaimStaleLock considers a lock stale when age > timeout, where timeout is now defaultLockTimeout = 20 * time.Minute. A crashed run's lock won't be reclaimed for 20 minutes even though tests complete in ~15 minutes.

The staleness threshold should be independent of the wait timeout — e.g., a separate staleLockTimeout constant of 5 minutes would reclaim crashed-run locks faster without affecting how long we're willing to wait for busy orgs.

const staleLockTimeout = 5 * time.Minute

// in tryReclaimStaleLock:
if age <= staleLockTimeout {

Flagged by 2/9 review agents

Comment thread hack/setup-new-e2e-org.sh
done

# --- 5. check mint enrollment ---
MINT_PROJECT="${MINT_PROJECT:-it-gcp-konflux-dev-fullsend}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] Hardcoded GCP project name in public repo

MINT_PROJECT, MINT_FUNCTION defaults expose internal cloud infrastructure naming in commit history of a public repository.

MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}"
MINT_REGION="${MINT_REGION:-us-central1}"
MINT_FUNCTION="${MINT_FUNCTION:?MINT_FUNCTION must be set}"

Flagged by 3/9 review agents

Comment thread e2e/admin/testutil.go

// orgPool is the set of GitHub orgs available for parallel e2e test runs.
// Each run acquires a lock on one org before proceeding.
var orgPool = []string{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM] orgPool is a mutable package-level var

Tests in lock_test.go mutate orgPool directly with save/restore via defer. This is not safe with t.Parallel() and is a maintenance hazard.

Consider passing the pool as a parameter to acquireOrg:

func acquireOrg(ctx context.Context, client forge.Client, token, runID string,
    pool []string, timeout time.Duration, logf func(string, ...any)) (string, error) {

Flagged by 2/9 review agents

Comment thread e2e/admin/admin_test.go
t.Log("Deleting PAT...")
if delErr := deletePAT(page, patNote, t.Logf); delErr != nil {
t.Logf("warning: could not delete PAT: %v", delErr)
t.Errorf("could not delete PAT: %v", delErr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] PAT deletion t.Errorf can fail otherwise-passing tests

Same pattern as lock release — PAT deletion is a cleanup operation that runs in t.Cleanup. Transient API failures here would mark the test as failed even when all assertions passed. The existing check on line 95 already handles "not found on page" gracefully.

Flagged by 4/9 review agents

Comment thread e2e/admin/admin_test.go
}
req.Header.Set("Authorization", "Bearer "+token)

resp, err := http.DefaultClient.Do(req)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Artifact download uses http.DefaultClient without explicit timeout

http.DefaultClient has no timeout. The request context provides some protection, but body reads aren't fully bounded by context cancellation. The 50MB LimitReader on line 508 is good, but a hung connection could block indefinitely.

httpClient := &http.Client{Timeout: 60 * time.Second}

Flagged by 2/9 review agents

Comment thread e2e/admin/testutil.go
func (d *e2eDispatcher) Provision(_ context.Context) (map[string]string, error) {
return map[string]string{"FULLSEND_MINT_URL": "https://e2e-test.example.com/mint"}, nil
// buildCLIBinary compiles the fullsend CLI binary once per test run.
func buildCLIBinary(t *testing.T) string {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] buildCLIBinary and runCLI have redundant work per call

buildCLIBinary uses t.TempDir() (new dir per test function) and runCLI calls go list -m on every invocation (line 233). Currently there are only 2 test functions and 3-4 runCLI calls, so the overhead is minor. Consider sync.Once or caching in a package-level var if more tests are added.

Flagged by 5/9 review agents

Comment thread .github/workflows/e2e.yml
e2e:
runs-on: ubuntu-latest
timeout-minutes: 15
timeout-minutes: 30
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] No workflow concurrency control after removing the global group

With only 2 active orgs, 3+ concurrent workflow runs will waste CI resources waiting for locks (up to 20 min each). Consider adding a concurrency group matching the pool size:

concurrency:
  group: ${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

This cancels superseded commits on the same branch/PR while still allowing parallel runs across different PRs.

Flagged by 4/9 review agents

Comment thread e2e/admin/lock.go

if err := client.DeleteRepo(ctx, org, lockRepo); err != nil {
t.Logf("[e2e-lock] Failed to release lock: %v", err)
t.Errorf("[e2e-lock] failed to release lock: %v", err)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW] Lock release t.Errorf can fail otherwise-passing tests

Changed from t.Logf to t.Errorf here and on line 158. Transient GitHub API failures during cleanup (network blips, rate limits) would now fail an otherwise-passing test. cleanup.go uses t.Logf for its operations — consider keeping lock release consistent, or at minimum distinguish "operation failed" (error) from "resource already cleaned up" (warning).

Flagged by 4/9 review agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

e2e: increase throughput with parallel-safe test isolation

2 participants