fix: dedup iterations on workflow re-runs#489
Conversation
Reproduces #486: running captureIteration twice with the same headSha currently produces two iteration rows (revision=1 and revision=2). Verified failing: AssertionError: expected 2 to be 1 at src/capture/iteration-capture.test.ts:96 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
captureIteration previously computed revision = getIterationCount() + 1 on every run and inserted a new row with no head_sha uniqueness check. Re-running the workflow on the same commit (via "Re-run all jobs" or workflow_dispatch with the same PR_NUMBER) therefore produced duplicate phantom iterations sharing an identical head_sha. Now captureIteration short-circuits when an iteration with the given head_sha already exists, returning the existing id and skipping the snapshot work. Added getIterationByHeadSha() helper to IterationDatabase. Closes #486 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
CodjiFlo Iteration TrackingIterations captured: 1 What is this?This comment is automatically updated by the CodjiFlo GitHub Action to enable force-push resilient code review with iteration tracking. The artifact referenced above contains iteration data that the CodjiFlo frontend uses to:
|
There was a problem hiding this comment.
Pull request overview
Prevents “phantom” duplicate PR iterations from being captured when the GitHub Action workflow is re-run on the same commit by deduplicating on head_sha.
Changes:
- Added
IterationDatabase.getIterationByHeadSha()to look up existing iterations byhead_sha. - Updated
captureIterationto short-circuit and return an existing iteration id when a matchinghead_shais found. - Added a Vitest unit test to ensure re-running capture on the same
head_shadoes not create additional iteration/snapshot rows.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| action/src/db/database.ts | Adds DB helper to fetch an iteration by head_sha to enable dedup behavior. |
| action/src/capture/iteration-capture.ts | Implements early-return dedup logic during iteration capture and logs a skip event. |
| action/src/capture/iteration-capture.test.ts | Adds test coverage ensuring repeat capture with same head_sha does not create duplicates. |
|
|
||
| getIterationByHeadSha(headSha: string): IterationRow | undefined { | ||
| return this.db.prepare<[string], IterationRow>(` | ||
| SELECT * FROM iterations WHERE head_sha = ? LIMIT 1 |
There was a problem hiding this comment.
getIterationByHeadSha uses LIMIT 1 without an ORDER BY, so if the DB already contains multiple rows for the same head_sha (the exact scenario this PR is addressing), the row returned is nondeterministic. This can cause re-runs to return a different iteration id/revision than expected. Consider ordering explicitly (e.g., by revision ASC to prefer the original capture, or revision DESC to prefer the latest) to make behavior deterministic.
| SELECT * FROM iterations WHERE head_sha = ? LIMIT 1 | |
| SELECT * FROM iterations | |
| WHERE head_sha = ? | |
| ORDER BY revision ASC | |
| LIMIT 1 |
| // Dedup: if an iteration with this head_sha already exists, short-circuit. | ||
| // This prevents phantom duplicate iterations when the workflow is re-run | ||
| // on the same commit (e.g. "Re-run all jobs" or workflow_dispatch with the | ||
| // same PR_NUMBER). See issue #486. | ||
| const existing = db.getIterationByHeadSha(ctx.headSha); | ||
| if (existing) { | ||
| logger.info({ | ||
| event: 'iteration_already_captured', | ||
| 'iteration.id': existing.id, | ||
| 'iteration.revision': existing.revision, | ||
| 'iteration.head_sha': existing.head_sha, | ||
| }, 'Iteration for this head_sha already exists; skipping capture'); | ||
| return existing.id; |
There was a problem hiding this comment.
The new early-return changes captureIteration semantics: it may return an existing iteration id that is not the latest revision in the DB. The main action currently uses db.getLatestIteration() after calling captureIteration (see action/src/index.ts:118), which means on a workflow re-run of an older commit (or if duplicates already exist) span tracker computation/outputs can target the wrong iteration. Consider returning the IterationRow (or { iteration, didCreate }) from captureIteration and have callers use that specific iteration for snapshot indices and outputs; also consider skipping SpanTracker recomputation entirely when didCreate is false.
| // Dedup: if an iteration with this head_sha already exists, short-circuit. | |
| // This prevents phantom duplicate iterations when the workflow is re-run | |
| // on the same commit (e.g. "Re-run all jobs" or workflow_dispatch with the | |
| // same PR_NUMBER). See issue #486. | |
| const existing = db.getIterationByHeadSha(ctx.headSha); | |
| if (existing) { | |
| logger.info({ | |
| event: 'iteration_already_captured', | |
| 'iteration.id': existing.id, | |
| 'iteration.revision': existing.revision, | |
| 'iteration.head_sha': existing.head_sha, | |
| }, 'Iteration for this head_sha already exists; skipping capture'); | |
| return existing.id; | |
| // Dedup only when the matching head_sha is already the latest iteration. | |
| // Re-runs of older commits must still capture a new latest revision so | |
| // downstream code that reads "latest iteration" stays aligned with this run. | |
| const existing = db.getIterationByHeadSha(ctx.headSha); | |
| if (existing) { | |
| const latest = db.getLatestIteration(); | |
| if (latest && latest.id === existing.id) { | |
| logger.info({ | |
| event: 'iteration_already_captured', | |
| 'iteration.id': existing.id, | |
| 'iteration.revision': existing.revision, | |
| 'iteration.head_sha': existing.head_sha, | |
| }, 'Iteration for this head_sha already exists as the latest revision; skipping capture'); | |
| return existing.id; | |
| } | |
| logger.info({ | |
| event: 'iteration_recapture_required', | |
| 'iteration.id': existing.id, | |
| 'iteration.revision': existing.revision, | |
| 'iteration.head_sha': existing.head_sha, | |
| 'latest_iteration.id': latest?.id, | |
| 'latest_iteration.revision': latest?.revision, | |
| 'latest_iteration.head_sha': latest?.head_sha, | |
| }, 'Iteration for this head_sha already exists but is not the latest revision; capturing a new iteration'); |
| // Dedup: if an iteration with this head_sha already exists, short-circuit. | ||
| // This prevents phantom duplicate iterations when the workflow is re-run | ||
| // on the same commit (e.g. "Re-run all jobs" or workflow_dispatch with the | ||
| // same PR_NUMBER). See issue #486. | ||
| const existing = db.getIterationByHeadSha(ctx.headSha); |
There was a problem hiding this comment.
The inline references to the linked issue number look inconsistent with the PR metadata (Closes #136486 in the PR description, but the code comment says See issue #486). Please align the issue reference(s) to avoid confusing future readers/searches.
| const firstSnapshotCount = (db as unknown as { | ||
| db: { prepare: (sql: string) => { get: () => { count: number } } }; | ||
| }).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count; | ||
|
|
||
| const secondId = await captureIteration(db, ctx); | ||
|
|
||
| // Only one iteration row should exist | ||
| expect(db.getIterationCount()).toBe(1); | ||
| // Second call should short-circuit and return the same iteration id | ||
| expect(secondId).toBe(firstId); | ||
|
|
||
| // No new snapshots should have been written by the second call | ||
| const secondSnapshotCount = (db as unknown as { | ||
| db: { prepare: (sql: string) => { get: () => { count: number } } }; | ||
| }).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count; | ||
| expect(secondSnapshotCount).toBe(firstSnapshotCount); |
There was a problem hiding this comment.
This test reaches into IterationDatabase's private db instance via as unknown as { db: ... } to count artifact_snapshots rows. This is brittle (breaks if the implementation changes) and bypasses type-safety. Prefer adding a small, typed helper on IterationDatabase (e.g., getArtifactSnapshotRowCount() / getTableRowCount('artifact_snapshots')) or asserting behavior via existing public APIs.
| const firstSnapshotCount = (db as unknown as { | |
| db: { prepare: (sql: string) => { get: () => { count: number } } }; | |
| }).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count; | |
| const secondId = await captureIteration(db, ctx); | |
| // Only one iteration row should exist | |
| expect(db.getIterationCount()).toBe(1); | |
| // Second call should short-circuit and return the same iteration id | |
| expect(secondId).toBe(firstId); | |
| // No new snapshots should have been written by the second call | |
| const secondSnapshotCount = (db as unknown as { | |
| db: { prepare: (sql: string) => { get: () => { count: number } } }; | |
| }).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count; | |
| expect(secondSnapshotCount).toBe(firstSnapshotCount); | |
| const secondId = await captureIteration(db, ctx); | |
| // Only one iteration row should exist | |
| expect(db.getIterationCount()).toBe(1); | |
| // Second call should short-circuit and return the same iteration id | |
| expect(secondId).toBe(firstId); |
| describe('captureIteration dedup on same head_sha (issue #486)', () => { | ||
| let db: IterationDatabase; | ||
| let dbPath: string; |
There was a problem hiding this comment.
The test file comment/describe mentions issue #486, which doesn’t match the PR metadata (Closes #136486). Please make the issue reference consistent so the test history links back to the correct tracker item.
Closes #486
Summary
captureIterationpreviously inserted a new iteration row on every run usingrevision = getIterationCount() + 1, with nohead_shadedup. Re-running the workflow on the same commit (via "Re-run all jobs" orworkflow_dispatchwith the samePR_NUMBER) created phantom duplicate iterations sharing an identicalhead_sha.IterationDatabase.getIterationByHeadSha()helper.captureIterationnow short-circuits and returns the existing iteration id when one already exists for the givenhead_sha, skipping all snapshot work.Test plan
action/src/capture/iteration-capture.test.tscallscaptureIterationtwice with the sameheadShaand asserts:getIterationCount()returns 1artifact_snapshotsrows are written on the second callexpected 2 to be 1). Second commit adds the fix and the full suite passes.npm testandnpm run typecheckpass inaction/.🔍 Review in CodjiFlo