feat(e2e): parallel test runs via org pool#1215
Conversation
Site previewPreview: https://2be0b1cc-site.fullsend-ai.workers.dev Commit: |
ReviewFindingsHigh
Medium
Low
Previous runReviewFindingsMedium
Low
Info
Previous run (2)ReviewFindingsHigh
Medium
Low
Previous run (3)ReviewFindingsMedium
Low
Previous run (4)ReviewFindingsMedium
Low
|
waynesun09
left a comment
There was a problem hiding this comment.
Review Squad Report — 10-agent review
Agents: 3× Claude Coder, 3× Claude Researcher, 2× Gemini, 2× Cursor
Findings: 14 after dedup, 4 false positives removed
Inline comments posted (2 HIGH, 5 MEDIUM)
| Sev | Finding | File | Consensus |
|---|---|---|---|
| HIGH | Fallback loop waits N × timeout | testutil.go:69 |
10/10 agents |
| HIGH | Concurrency group is a no-op | e2e.yml:15 |
8/10 agents |
| MEDIUM | Second() % 30 vs Unix() % 30 boundary bug |
otp.go:21 |
1 agent (verified) |
| MEDIUM | NetworkIdle wait can hang |
page.go:54 |
7/10 agents |
| MEDIUM | TOTP retry generates same code | page.go:70 |
2 agents |
| MEDIUM | Fixed-order thundering herd | testutil.go:52 |
2 agents |
| MEDIUM | No acquireOrg unit tests |
lock_test.go:15 |
5 agents |
Deferred hardening (LOW — not blocking)
- Missing space after comma in 4×
deleteAppViaPlaywrightcalls (cleanup.go:51,58,66,84) — rungofmt E2E_HALFSEND_WIF_PROVIDERenv var still references old org name — rename in follow-uphandleSudoIfPresentfails when password field visible but empty, without trying TOTP fallbackwaitForPageToLeaveuses busy-polling instead of Playwright native waitsorgPoolis a mutable package-levelvar
|
|
||
| // All orgs are locked. Fall back to waiting on the first available. | ||
| logf("[org-pool] All %d orgs are locked, waiting with timeout %s", len(orgPool), timeout) |
There was a problem hiding this comment.
HIGH — Fallback loop waits N × timeout instead of timeout
Each acquireLock call blocks for the full timeout (default 2 min). With 6 orgs, worst case = 6 × 2 min = 12 min — nearly exhausting the 15-min job timeout. The comment says "waiting with timeout" (singular) but the actual behavior multiplies it by pool size.
Consider using a single shared deadline across the fallback:
deadline := time.Now().Add(timeout)
for time.Now().Before(deadline) {
for _, org := range orgPool {
remaining := time.Until(deadline)
if remaining <= 0 {
break
}
err := acquireLock(ctx, client, token, org, runID, min(remaining, lockPollInterval*2), logf)
if err == nil {
return org, nil
}
}
}Flagged by 10/10 review agents (unanimous consensus).
There was a problem hiding this comment.
Already addressed — the fallback loop uses a shared deadlineCtx (not per-org timeout). See 4e149b3.
|
|
||
| concurrency: | ||
| group: e2e-halfsend | ||
| group: e2e-${{ github.run_id }} |
There was a problem hiding this comment.
HIGH — Concurrency group is effectively a no-op
github.run_id is unique per workflow run, so no two runs ever share the same concurrency group. Combined with removing queue: max, there is no limit on simultaneous e2e runs. If >6 PRs trigger e2e at once, excess runs will contend for the 6-org pool and likely timeout.
Either remove the concurrency block entirely to make the intent explicit, or use a meaningful shared group:
concurrency:
group: e2e-pool
cancel-in-progress: falseFlagged by 8/10 review agents.
There was a problem hiding this comment.
Already addressed — the concurrency block was removed in an earlier commit.
| } | ||
| // Avoid generating a code near a period boundary where it could expire | ||
| // during the ~300ms PressSequentially typing delay plus network round-trip. | ||
| if rem := time.Now().Second() % 30; rem >= 26 { |
There was a problem hiding this comment.
MEDIUM — Boundary check uses wall-clock seconds instead of Unix epoch
time.Now().Second() returns 0–59 within the current minute, but TOTP periods are calculated from Unix epoch (Unix() / 30). time.Now().Second() % 30 is not equivalent to time.Now().Unix() % 30.
For example, if Unix time is 1716148226, Unix() % 30 == 16 (safe, mid-period), but Second() could return 26 (triggering an unnecessary 5s sleep). Conversely, it could miss an actual boundary.
| if rem := time.Now().Second() % 30; rem >= 26 { | |
| if rem := time.Now().Unix() % 30; rem >= 26 { | |
| time.Sleep(time.Duration(30-rem+1) * time.Second) |
There was a problem hiding this comment.
I think this is a false positive. Second() returns Unix() % 60, and since 30 divides 60 evenly, Second() % 30 == Unix() % 30 always. Do you see it differently?
| }); err != nil { | ||
| return false, fmt.Errorf("waiting for post-TOTP navigation: %w", err) |
There was a problem hiding this comment.
MEDIUM — NetworkIdle wait can hang on GitHub pages
GitHub pages often have background WebSocket connections, analytics, or long-polling that prevent the network from becoming "idle". This can cause the wait to hang for the full default Playwright timeout (30s). Other navigation waits in this codebase use DOMContentLoaded.
| }); err != nil { | |
| return false, fmt.Errorf("waiting for post-TOTP navigation: %w", err) | |
| if err := page.WaitForLoadState(playwright.PageWaitForLoadStateOptions{ | |
| State: playwright.LoadStateDomcontentloaded, |
Flagged by 7/10 review agents.
There was a problem hiding this comment.
Already addressed — switched to LoadStateDomcontentloaded in 8ab00a0.
| } | ||
| } |
There was a problem hiding this comment.
MEDIUM — Retry generates the same TOTP code
A 2s sleep stays within the same 30s TOTP period, so GenerateCode produces the identical code. If GitHub rejected the first code, the identical retry will also be rejected.
Wait for the TOTP period to actually roll over before retrying:
nextPeriod := (time.Now().Unix()/30 + 1) * 30
time.Sleep(time.Until(time.Unix(nextPeriod, 0)) + 1*time.Second)There was a problem hiding this comment.
Already addressed — retry now waits for the next TOTP period boundary instead of a fixed 2s sleep. See 8ab00a0.
| // acquireOrg scans the org pool for an unlocked org and acquires its lock. | ||
| // If all orgs are locked, it falls back to waiting on the first org in the | ||
| // pool (with the standard lock timeout). Returns the org name. | ||
| func acquireOrg(ctx context.Context, client forge.Client, token, runID string, timeout time.Duration, logf func(string, ...any)) (string, error) { |
There was a problem hiding this comment.
MEDIUM — Fixed iteration order creates thundering herd
Both the first-pass scan and fallback loop iterate orgPool in fixed order, so all concurrent runners compete for halfsend-01 first. This creates uneven wear and reduces effective parallelism.
Shuffle the pool at the start of acquireOrg:
shuffled := slices.Clone(orgPool)
rand.Shuffle(len(shuffled), func(i, j int) { shuffled[i], shuffled[j] = shuffled[j], shuffled[i] })There was a problem hiding this comment.
Fixed in 4e2968e — org pool is now shuffled before scanning.
| "github.com/stretchr/testify/require" | ||
| ) | ||
|
|
||
| const testLockOrg = "halfsend-test" |
There was a problem hiding this comment.
MEDIUM — No unit tests for acquireOrg
The core new function acquireOrg has no unit tests. Only acquireLock/releaseLock are tested. Key untested scenarios:
- First org free → should acquire it
- First N orgs locked, N+1 free → should skip locked and acquire free one
- All orgs locked + timeout → should return error within expected time
tryCreateLockreturns error vs "locked" → error propagation
These can be tested with forge.NewFakeClient() by pre-populating lock repos for specific orgs.
There was a problem hiding this comment.
Added TestAcquireOrg_SkipsLockedOrg in 4e2968e. There were already 3 acquireOrg tests; that commit covers the missing "first org locked, second free" case.
1b632db to
fe94ddf
Compare
Replace the single hardcoded `halfsend` test org with a pool of 6 orgs (halfsend-01 through halfsend-06). Each e2e run scans the pool for an unlocked org and claims it, only falling back to waiting if all orgs are locked. This allows up to 6 concurrent e2e runs. - Add acquireOrg() that does a non-blocking scan before waiting - Thread acquired org through setupE2ETest, cleanup, and browser helpers - Change workflow concurrency group from global to per-run Closes #1047 Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
Record the decision to use a pool of GitHub orgs with repo-based distributed locking for parallel e2e test execution. Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
- Bound fallback wait with a shared context deadline so total wait is capped at timeout, not N * timeout across the pool - Distinguish "already exists" from real errors in tryCreateLock so rate limits and auth failures are not silently swallowed - Add unit tests for acquireOrg: happy path, all-locked timeout, and error propagation - Fix missing space after comma in cleanup.go (4 occurrences) - Fix misleading comment on acquireOrg fallback behavior Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
- Renumber ADR 0039 → 0040 to avoid collision with TOTP ADR - Fix ADR heading to match file number (39 → 40) - Remove no-op concurrency group from e2e workflow (org pool handles isolation; per-run-id group provided zero serialization) - Wrap last error in acquireOrg failure message for better diagnostics - Fix misleading "exponential" backoff comment (it's linear) Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
Replace the fake e2eDispatcher with a sharedMintDispatcher that points at the real public mint Cloud Function. Switch from org-scoped app creation to installing the shared fullsend-ai-* apps (fullsend, triage, coder, review). Key changes: - Add E2E_MINT_URL env var (required) for the shared mint URL - Replace e2eDispatcher with sharedMintDispatcher (real OIDC dispatch) - Use fullsend-ai app set with WithPublicApps(true) - Add select_target page handler for multi-org GitHub App installs - Add tryReclaimStaleLock in first pass of acquireOrg - Simplify cleanup to never delete shared apps - Make enrollment PR checks soft (OIDC mode may skip PR creation) - Gate triage smoke and unenrollment phases on enrollment success - Add E2E_MINT_URL to CI workflow Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
Replace the e2e test's programmatic layer-stack installation with subprocess invocations of 'fullsend admin install' and 'fullsend admin uninstall', testing the CLI as users actually call it. Key changes: - Phase 1 runs 'fullsend admin install' with --skip-app-setup, --vendor-fullsend-binary, --enroll-all - Phases 2-4 (enrollment, triage, unenrollment) are unconditional - Phase 5 runs 'fullsend admin uninstall --yolo' - Cleanup failures (PAT deletion, lock release) are hard test errors - Delete browser.go, prompter.go, sharedMintDispatcher, and all programmatic install/uninstall/verify functions Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
The triage smoke test alone can take 12+ minutes, plus install, enrollment, unenrollment, and uninstall phases. Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
The other halfsend orgs are not yet enrolled in the shared mint. Uncomment them once they are registered. Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
The --vendor-fullsend-binary flag runs 'go build ./cmd/fullsend/' which needs to be relative to the repo root, not the test package directory where go test sets cwd. Signed-off-by: Ryan Bean <[email protected]> Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
With --skip-app-setup --skip-mint-check, the CLI doesn't discover or create apps, so config.yaml has no agent entries. The roles are still populated in defaults.roles. Assert on those instead. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
The default agent roles now include retro and prioritize (6 total). Update the test constant to match config.DefaultAgentRoles(). Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
Idempotent script that provisions a halfsend-NN org for e2e testing: - Checks org exists (instructs user to create if not) - Verifies botsend is an owner (invites if not) - Creates test-repo if missing - Checks each fullsend-ai app is installed, opens browser for missing ones and waits for user to click through Usage: hack/setup-new-e2e-org NN Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
The e2e workflow only triggered on *.go, go.mod, go.sum, and e2e/** changes. This missed go:embed'd files (scaffold templates, security hooks, mint source, tokenizer data) and the Makefile — all of which change compiled binary behavior exercised by the e2e tests. Also enables halfsend-02 in the org pool for parallel e2e runs. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
With two orgs in the pool and e2e runs taking ~15 minutes, a 2-minute timeout was too short to wait for a locked org to become available. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
Add google-github-actions/auth step to authenticate to the dedicated
e2e GCP project (it-gcp-konflux-e2e-fullsend) using WIF. This provides
Application Default Credentials so the CLI can auto-provision WIF
infrastructure for inference during the install flow.
The E2E_GCP_WIF_PROVIDER secret needs to be set once the GCP project
is provisioned and the CI WIF pool/provider is created. Format:
projects/{NUMBER}/locations/global/workloadIdentityPools/{pool}/providers/{provider}
Assisted-by: Claude Opus 4.6 <[email protected]>
Signed-off-by: Ralph Bean <[email protected]>
Replace the hardcoded service account identifier with a secrets.E2E_GCP_SERVICE_ACCOUNT reference, per CLAUDE.md policy against committing GCP project names and service account identifiers. Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
- Shuffle org pool before scanning to avoid thundering herd when concurrent runners all compete for the first org. - Add TestAcquireOrg_SkipsLockedOrg to cover the "first org locked, second free" scenario. - Fix FakeClient.CreateRepo dedup to match by full name (org/repo), not bare repo name, so same-named repos in different orgs don't collide. Assisted-by: Claude claude-opus-4-6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
f5d1267 to
4e2968e
Compare
When the triage dispatch smoke test fails, save workflow logs and download all run artifacts (transcripts, screenshots) to the e2e screenshot directory. Emit ::notice:: annotations so artifacts are easy to find in the GitHub Actions UI. Assisted-by: Claude Opus 4.6 <[email protected]> Signed-off-by: Ralph Bean <[email protected]>
| gh api "/orgs/${ORG}/repos" \ | ||
| -f name="test-repo" \ | ||
| -f description="E2E test repo" \ | ||
| -F private=false \ |
There was a problem hiding this comment.
[high] sensitive-data
Hardcoded GCP project name it-gcp-konflux-dev-fullsend as a default value. CLAUDE.md explicitly prohibits committing GCP project names: "Never commit secrets or sensitive data (GCP project names, ...). Use environment variables with no defaults for sensitive values."
Suggested fix: Remove the default and require MINT_PROJECT to be set via env var: MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}"
| // each org only uses its fair share of the shared deadline. | ||
| err := acquireLock(deadlineCtx, client, token, org, runID, timeout, logf) | ||
| if err == nil { | ||
| return org, nil |
There was a problem hiding this comment.
[low] misleading-comment
Comment says "Pass remaining context time, not the full timeout" but the code passes the full timeout value to acquireLock. The bounding comes from deadlineCtx, not the timeout parameter.
Suggested fix: Update the comment to clarify that the shared deadlineCtx bounds total wait, not the timeout parameter.
| if err == nil { | ||
| return false | ||
| } | ||
| msg := err.Error() |
There was a problem hiding this comment.
[low] fragile-error-matching
isRepoAlreadyExists matches on the string "422" in the error message. HTTP 422 can indicate many validation errors besides "already exists", which could mask unexpected failures.
Suggested fix: Prefer checking for "already exists" only, or implement typed error checking from the GitHub client.
|
The e2e triage smoke test is failing because
The artifact download feature (74963c0) worked — we got the transcripts, logs, and sandbox traces, which is how we diagnosed this. Waiting on a ticket to get the right models enabled in the e2e GCP account. |
waynesun09
left a comment
There was a problem hiding this comment.
Review Squad Report (Round 3) — 10-agent review
Agents: 2x claude-coder, 2x claude-researcher, 4x gemini-code-review, 2x cursor-code-review
Findings: 5 MEDIUM, 7 LOW (after dedup + verification against actual code)
False positives removed: 6
Prior review status
- ✅ Both HIGHs fixed (round 2)
- ✅ Thundering herd — fixed (
rand.Shuffle) - ✅ acquireOrg tests — added (4 test functions)
- ℹ️ 3 TOTP MEDIUMs (Second vs Unix, NetworkIdle, retry) — not in this PR's diff, files unchanged from main (merged via #1211)
New findings (this review)
- 5 MEDIUM: fallback timeout starvation,
isRepoAlreadyExists422 false-match, stale lock threshold, hardcoded GCP project, mutableorgPoolvar - 7 LOW: timeout budget, CLI logging, cleanup
t.Errorf, artifact download timeout, build caching, workflow concurrency
False positives removed
- "Corrupted GH Action references" (2 gemini agents) — hallucinated worktree paths as file content
- "Zip slip bypass via sibling dir" — trailing
os.PathSeparatorin HasPrefix prevents this - "Lock reclaim TOCTOU race" — GitHub atomic CreateRepo + verify-read-back prevents this
- "Starvation spin on empty lock repos" — code handles NotFound correctly with poll interval
| } | ||
| // Pass remaining context time, not the full timeout, so that | ||
| // each org only uses its fair share of the shared deadline. | ||
| err := acquireLock(deadlineCtx, client, token, org, runID, timeout, logf) |
There was a problem hiding this comment.
[MEDIUM] Fallback loop passes full timeout, starving later orgs
The comment on line 95-96 says "pass remaining context time" but the code passes the original timeout. Since acquireLock uses timeout for its internal deadline and the stale-lock age threshold, the first org in the fallback sequence can consume the entire 20-minute budget before org 2 is ever tried. If org-2 frees up during the wait, the runner misses it.
// suggestion: pass remaining time from the deadline context
remaining := time.Until(deadline) // extract from deadlineCtx
err := acquireLock(deadlineCtx, client, token, org, runID, remaining, logf)Or refactor to round-robin poll all orgs per interval instead of blocking on one at a time.
Flagged by 7/9 review agents (consensus)
| return false | ||
| } | ||
| msg := err.Error() | ||
| return strings.Contains(msg, "already exists") || strings.Contains(msg, "422") |
There was a problem hiding this comment.
[MEDIUM] isRepoAlreadyExists matches on "422" substring — overly broad
HTTP 422 from GitHub can indicate many validation errors beyond "already exists" (invalid name, reserved name, auth/permission issues). A non-duplicate 422 error would be silently swallowed, making tryCreateLock return (false, nil) when the API call actually failed — causing a 20-minute wait instead of fast failure.
The "already exists" check alone is sufficient for both the real GitHub API (which returns "name already exists on this account") and the fake client (which returns "repository already exists: ...").
func isRepoAlreadyExists(err error) bool {
if err == nil {
return false
}
return strings.Contains(err.Error(), "already exists")
}Flagged by 6/9 review agents (strongest consensus)
| return false | ||
| } | ||
| age := time.Since(createdAt) | ||
| if age <= timeout { |
There was a problem hiding this comment.
[MEDIUM] Stale lock reclaim threshold tied to full 20-minute timeout
tryReclaimStaleLock considers a lock stale when age > timeout, where timeout is now defaultLockTimeout = 20 * time.Minute. A crashed run's lock won't be reclaimed for 20 minutes even though tests complete in ~15 minutes.
The staleness threshold should be independent of the wait timeout — e.g., a separate staleLockTimeout constant of 5 minutes would reclaim crashed-run locks faster without affecting how long we're willing to wait for busy orgs.
const staleLockTimeout = 5 * time.Minute
// in tryReclaimStaleLock:
if age <= staleLockTimeout {Flagged by 2/9 review agents
| done | ||
|
|
||
| # --- 5. check mint enrollment --- | ||
| MINT_PROJECT="${MINT_PROJECT:-it-gcp-konflux-dev-fullsend}" |
There was a problem hiding this comment.
[MEDIUM] Hardcoded GCP project name in public repo
MINT_PROJECT, MINT_FUNCTION defaults expose internal cloud infrastructure naming in commit history of a public repository.
MINT_PROJECT="${MINT_PROJECT:?MINT_PROJECT must be set}"
MINT_REGION="${MINT_REGION:-us-central1}"
MINT_FUNCTION="${MINT_FUNCTION:?MINT_FUNCTION must be set}"Flagged by 3/9 review agents
|
|
||
| // orgPool is the set of GitHub orgs available for parallel e2e test runs. | ||
| // Each run acquires a lock on one org before proceeding. | ||
| var orgPool = []string{ |
There was a problem hiding this comment.
[MEDIUM] orgPool is a mutable package-level var
Tests in lock_test.go mutate orgPool directly with save/restore via defer. This is not safe with t.Parallel() and is a maintenance hazard.
Consider passing the pool as a parameter to acquireOrg:
func acquireOrg(ctx context.Context, client forge.Client, token, runID string,
pool []string, timeout time.Duration, logf func(string, ...any)) (string, error) {Flagged by 2/9 review agents
| t.Log("Deleting PAT...") | ||
| if delErr := deletePAT(page, patNote, t.Logf); delErr != nil { | ||
| t.Logf("warning: could not delete PAT: %v", delErr) | ||
| t.Errorf("could not delete PAT: %v", delErr) |
There was a problem hiding this comment.
[LOW] PAT deletion t.Errorf can fail otherwise-passing tests
Same pattern as lock release — PAT deletion is a cleanup operation that runs in t.Cleanup. Transient API failures here would mark the test as failed even when all assertions passed. The existing check on line 95 already handles "not found on page" gracefully.
Flagged by 4/9 review agents
| } | ||
| req.Header.Set("Authorization", "Bearer "+token) | ||
|
|
||
| resp, err := http.DefaultClient.Do(req) |
There was a problem hiding this comment.
[LOW] Artifact download uses http.DefaultClient without explicit timeout
http.DefaultClient has no timeout. The request context provides some protection, but body reads aren't fully bounded by context cancellation. The 50MB LimitReader on line 508 is good, but a hung connection could block indefinitely.
httpClient := &http.Client{Timeout: 60 * time.Second}Flagged by 2/9 review agents
| func (d *e2eDispatcher) Provision(_ context.Context) (map[string]string, error) { | ||
| return map[string]string{"FULLSEND_MINT_URL": "https://e2e-test.example.com/mint"}, nil | ||
| // buildCLIBinary compiles the fullsend CLI binary once per test run. | ||
| func buildCLIBinary(t *testing.T) string { |
There was a problem hiding this comment.
[LOW] buildCLIBinary and runCLI have redundant work per call
buildCLIBinary uses t.TempDir() (new dir per test function) and runCLI calls go list -m on every invocation (line 233). Currently there are only 2 test functions and 3-4 runCLI calls, so the overhead is minor. Consider sync.Once or caching in a package-level var if more tests are added.
Flagged by 5/9 review agents
| e2e: | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 15 | ||
| timeout-minutes: 30 |
There was a problem hiding this comment.
[LOW] No workflow concurrency control after removing the global group
With only 2 active orgs, 3+ concurrent workflow runs will waste CI resources waiting for locks (up to 20 min each). Consider adding a concurrency group matching the pool size:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: trueThis cancels superseded commits on the same branch/PR while still allowing parallel runs across different PRs.
Flagged by 4/9 review agents
|
|
||
| if err := client.DeleteRepo(ctx, org, lockRepo); err != nil { | ||
| t.Logf("[e2e-lock] Failed to release lock: %v", err) | ||
| t.Errorf("[e2e-lock] failed to release lock: %v", err) |
There was a problem hiding this comment.
[LOW] Lock release t.Errorf can fail otherwise-passing tests
Changed from t.Logf to t.Errorf here and on line 158. Transient GitHub API failures during cleanup (network blips, rate limits) would now fail an otherwise-passing test. cleanup.go uses t.Logf for its operations — consider keeping lock release consistent, or at minimum distinguish "operation failed" (error) from "resource already cleaned up" (warning).
Flagged by 4/9 review agents
Summary
halfsendtest org with a pool of 6 orgs (halfsend-01throughhalfsend-06)e2e-halfsendworkflow concurrency group — contention is now handled by the org poolsetupE2ETest, cleanup, browser helpers)Dependencies
Depends on #1211 (TOTP support) — this branch includes those commits.
Test plan
go vet -tags e2e ./e2e/admin/...passesTestAcquireLock,TestReleaseLock)make lintpassesCloses #1047
🤖 Generated with Claude Code