Skip to content

fix: dedup iterations on workflow re-runs#489

Open
pedropaulovc wants to merge 2 commits into
mainfrom
fix/issue-486-iteration-dedup
Open

fix: dedup iterations on workflow re-runs#489
pedropaulovc wants to merge 2 commits into
mainfrom
fix/issue-486-iteration-dedup

Conversation

@pedropaulovc
Copy link
Copy Markdown
Collaborator

@pedropaulovc pedropaulovc commented Apr 17, 2026

Closes #486

Summary

  • captureIteration previously inserted a new iteration row on every run using revision = getIterationCount() + 1, with no head_sha dedup. Re-running the workflow on the same commit (via "Re-run all jobs" or workflow_dispatch with the same PR_NUMBER) created phantom duplicate iterations sharing an identical head_sha.
  • Added IterationDatabase.getIterationByHeadSha() helper.
  • captureIteration now short-circuits and returns the existing iteration id when one already exists for the given head_sha, skipping all snapshot work.

Test plan

  • New unit test action/src/capture/iteration-capture.test.ts calls captureIteration twice with the same headSha and asserts:
    • getIterationCount() returns 1
    • both calls return the same iteration id
    • no additional artifact_snapshots rows are written on the second call
  • TDD flow: first commit adds the test and it fails (expected 2 to be 1). Second commit adds the fix and the full suite passes.
  • npm test and npm run typecheck pass in action/.

🔍 Review in CodjiFlo

pedropaulovc and others added 2 commits April 17, 2026 12:28
Reproduces #486: running captureIteration twice with the same headSha
currently produces two iteration rows (revision=1 and revision=2).

Verified failing:
  AssertionError: expected 2 to be 1
    at src/capture/iteration-capture.test.ts:96

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
captureIteration previously computed revision = getIterationCount() + 1
on every run and inserted a new row with no head_sha uniqueness check.
Re-running the workflow on the same commit (via "Re-run all jobs" or
workflow_dispatch with the same PR_NUMBER) therefore produced duplicate
phantom iterations sharing an identical head_sha.

Now captureIteration short-circuits when an iteration with the given
head_sha already exists, returning the existing id and skipping the
snapshot work. Added getIterationByHeadSha() helper to IterationDatabase.

Closes #486

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 19:29
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 17, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
codjiflo Ready Ready Preview Apr 17, 2026 7:29pm

@pedropaulovc pedropaulovc enabled auto-merge April 17, 2026 19:29
@github-actions
Copy link
Copy Markdown

CodjiFlo Iteration Tracking

Iterations captured: 1
Last updated: 2026-04-17T19:29:26.000Z
Artifact: 6503418008
Run ID: 24583006295


What is this?

This comment is automatically updated by the CodjiFlo GitHub Action to enable force-push resilient code review with iteration tracking.

The artifact referenced above contains iteration data that the CodjiFlo frontend uses to:

  • Track code changes across force-pushes
  • Enable comment persistence across code modifications
  • Allow comparison between any two iterations

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prevents “phantom” duplicate PR iterations from being captured when the GitHub Action workflow is re-run on the same commit by deduplicating on head_sha.

Changes:

  • Added IterationDatabase.getIterationByHeadSha() to look up existing iterations by head_sha.
  • Updated captureIteration to short-circuit and return an existing iteration id when a matching head_sha is found.
  • Added a Vitest unit test to ensure re-running capture on the same head_sha does not create additional iteration/snapshot rows.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
action/src/db/database.ts Adds DB helper to fetch an iteration by head_sha to enable dedup behavior.
action/src/capture/iteration-capture.ts Implements early-return dedup logic during iteration capture and logs a skip event.
action/src/capture/iteration-capture.test.ts Adds test coverage ensuring repeat capture with same head_sha does not create duplicates.

Comment thread action/src/db/database.ts

getIterationByHeadSha(headSha: string): IterationRow | undefined {
return this.db.prepare<[string], IterationRow>(`
SELECT * FROM iterations WHERE head_sha = ? LIMIT 1
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getIterationByHeadSha uses LIMIT 1 without an ORDER BY, so if the DB already contains multiple rows for the same head_sha (the exact scenario this PR is addressing), the row returned is nondeterministic. This can cause re-runs to return a different iteration id/revision than expected. Consider ordering explicitly (e.g., by revision ASC to prefer the original capture, or revision DESC to prefer the latest) to make behavior deterministic.

Suggested change
SELECT * FROM iterations WHERE head_sha = ? LIMIT 1
SELECT * FROM iterations
WHERE head_sha = ?
ORDER BY revision ASC
LIMIT 1

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +56
// Dedup: if an iteration with this head_sha already exists, short-circuit.
// This prevents phantom duplicate iterations when the workflow is re-run
// on the same commit (e.g. "Re-run all jobs" or workflow_dispatch with the
// same PR_NUMBER). See issue #486.
const existing = db.getIterationByHeadSha(ctx.headSha);
if (existing) {
logger.info({
event: 'iteration_already_captured',
'iteration.id': existing.id,
'iteration.revision': existing.revision,
'iteration.head_sha': existing.head_sha,
}, 'Iteration for this head_sha already exists; skipping capture');
return existing.id;
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new early-return changes captureIteration semantics: it may return an existing iteration id that is not the latest revision in the DB. The main action currently uses db.getLatestIteration() after calling captureIteration (see action/src/index.ts:118), which means on a workflow re-run of an older commit (or if duplicates already exist) span tracker computation/outputs can target the wrong iteration. Consider returning the IterationRow (or { iteration, didCreate }) from captureIteration and have callers use that specific iteration for snapshot indices and outputs; also consider skipping SpanTracker recomputation entirely when didCreate is false.

Suggested change
// Dedup: if an iteration with this head_sha already exists, short-circuit.
// This prevents phantom duplicate iterations when the workflow is re-run
// on the same commit (e.g. "Re-run all jobs" or workflow_dispatch with the
// same PR_NUMBER). See issue #486.
const existing = db.getIterationByHeadSha(ctx.headSha);
if (existing) {
logger.info({
event: 'iteration_already_captured',
'iteration.id': existing.id,
'iteration.revision': existing.revision,
'iteration.head_sha': existing.head_sha,
}, 'Iteration for this head_sha already exists; skipping capture');
return existing.id;
// Dedup only when the matching head_sha is already the latest iteration.
// Re-runs of older commits must still capture a new latest revision so
// downstream code that reads "latest iteration" stays aligned with this run.
const existing = db.getIterationByHeadSha(ctx.headSha);
if (existing) {
const latest = db.getLatestIteration();
if (latest && latest.id === existing.id) {
logger.info({
event: 'iteration_already_captured',
'iteration.id': existing.id,
'iteration.revision': existing.revision,
'iteration.head_sha': existing.head_sha,
}, 'Iteration for this head_sha already exists as the latest revision; skipping capture');
return existing.id;
}
logger.info({
event: 'iteration_recapture_required',
'iteration.id': existing.id,
'iteration.revision': existing.revision,
'iteration.head_sha': existing.head_sha,
'latest_iteration.id': latest?.id,
'latest_iteration.revision': latest?.revision,
'latest_iteration.head_sha': latest?.head_sha,
}, 'Iteration for this head_sha already exists but is not the latest revision; capturing a new iteration');

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +48
// Dedup: if an iteration with this head_sha already exists, short-circuit.
// This prevents phantom duplicate iterations when the workflow is re-run
// on the same commit (e.g. "Re-run all jobs" or workflow_dispatch with the
// same PR_NUMBER). See issue #486.
const existing = db.getIterationByHeadSha(ctx.headSha);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline references to the linked issue number look inconsistent with the PR metadata (Closes #136486 in the PR description, but the code comment says See issue #486). Please align the issue reference(s) to avoid confusing future readers/searches.

Copilot uses AI. Check for mistakes.
Comment on lines +89 to +104
const firstSnapshotCount = (db as unknown as {
db: { prepare: (sql: string) => { get: () => { count: number } } };
}).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count;

const secondId = await captureIteration(db, ctx);

// Only one iteration row should exist
expect(db.getIterationCount()).toBe(1);
// Second call should short-circuit and return the same iteration id
expect(secondId).toBe(firstId);

// No new snapshots should have been written by the second call
const secondSnapshotCount = (db as unknown as {
db: { prepare: (sql: string) => { get: () => { count: number } } };
}).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count;
expect(secondSnapshotCount).toBe(firstSnapshotCount);
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test reaches into IterationDatabase's private db instance via as unknown as { db: ... } to count artifact_snapshots rows. This is brittle (breaks if the implementation changes) and bypasses type-safety. Prefer adding a small, typed helper on IterationDatabase (e.g., getArtifactSnapshotRowCount() / getTableRowCount('artifact_snapshots')) or asserting behavior via existing public APIs.

Suggested change
const firstSnapshotCount = (db as unknown as {
db: { prepare: (sql: string) => { get: () => { count: number } } };
}).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count;
const secondId = await captureIteration(db, ctx);
// Only one iteration row should exist
expect(db.getIterationCount()).toBe(1);
// Second call should short-circuit and return the same iteration id
expect(secondId).toBe(firstId);
// No new snapshots should have been written by the second call
const secondSnapshotCount = (db as unknown as {
db: { prepare: (sql: string) => { get: () => { count: number } } };
}).db.prepare('SELECT COUNT(*) as count FROM artifact_snapshots').get().count;
expect(secondSnapshotCount).toBe(firstSnapshotCount);
const secondId = await captureIteration(db, ctx);
// Only one iteration row should exist
expect(db.getIterationCount()).toBe(1);
// Second call should short-circuit and return the same iteration id
expect(secondId).toBe(firstId);

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +60
describe('captureIteration dedup on same head_sha (issue #486)', () => {
let db: IterationDatabase;
let dbPath: string;
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file comment/describe mentions issue #486, which doesn’t match the PR metadata (Closes #136486). Please make the issue reference consistent so the test history links back to the correct tracker item.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Re-running workflow on same commit creates duplicate phantom iterations

2 participants