Skip to content

fix(#507,#509): dashboard merge-snapshot collision + no-batch poll debounce#596

Merged
HenryLach merged 1 commit into
mainfrom
fix/507-509-dashboard-pair
May 25, 2026
Merged

fix(#507,#509): dashboard merge-snapshot collision + no-batch poll debounce#596
HenryLach merged 1 commit into
mainfrom
fix/507-509-dashboard-pair

Conversation

@HenryLach
Copy link
Copy Markdown
Owner

Closes #507. Closes #509.

Two paired dashboard fixes, landed together because both are small-surface-area observability issues in the same UI layer.

#509 — merge agent telemetry missing for some waves

Root cause

writeMergeSnapshot keyed the on-disk filename by mergeNumber alone, and mergeNumber is derived from lane.laneNumber (see merge.ts:1833 and merge.ts:855-864). Lane numbers reset every wave, so wave N+1's lane-1 merge silently overwrote wave N's lane-1 terminal snapshot before the dashboard could read it.

This exactly matches the symptom in the issue: dashboard renders telemetry for some waves and —— for others, with the pattern correlating to which waves' lane numbers were reused.

The fix

Per-wave filename namespacing in runtimeMergeSnapshotPath:

- .pi/runtime/{batchId}/lanes/merge-{mergeNumber}.json
+ .pi/runtime/{batchId}/lanes/merge-w{waveIndex}-{mergeNumber}.json

writeMergeSnapshot / readMergeSnapshot signatures gain waveIndex. All four call sites in merge.ts already had waveIndex in scope (parameter of spawnMergeAgentV2) and pass it through as waveIndex ?? 0, matching the existing nullish-coalesce used to populate the snapshot's own waveIndex field.

loadRuntimeMergeSnapshots in dashboard/server.cjs correspondingly switches its intermediate map key from mergeNumber alone to w{waveIndex}-{mergeNumber}, because keying by mergeNumber alone reproduced the same collision at read time (the snapshot files would all exist on disk under the new pattern, but the loader's snapshots[data.mergeNumber] = data would still squash multiple wave entries down to one). Filename filter remains permissive (merge-*.json) so legacy snapshots from pre-fix batches still load and key as plain {mergeNumber} for back-compat.

#507 — dashboard briefly flips to history view at batch startup

Root cause

A single missed SSE poll during batch-state.json write at startup triggered the no-batch handler in app.js, which closes the viewer and renders the history panel. The next poll picked up the new batch and flipped back to live view, but the user perceived a 'flash of history' between two live batches.

The fix

3-consecutive-miss debounce on the no-batch transition in app.js. With the server's 2s POLL_INTERVAL, the threshold corresponds to ~6s of confirmed batch-absence — well past the sub-second batch-state.json write window while still cleaning up promptly when a batch genuinely ends. The miss counter resets the moment a batch reappears.

Tests

Adds 4 regression tests to process-registry.test.ts (7.4 through 7.7) covering the #509 invariants:

# Asserts
7.4 writeMergeSnapshot produces a filename containing the waveIndex
7.5 Same mergeNumber across two waves writes to distinct files
7.6 Wave-1 write does not overwrite wave-0 with same mergeNumber (the exact failure mode from the issue body)
7.7 readMergeSnapshot returns null for absent (wave, mergeNumber) tuples — no accidental cross-wave fallback

#507 is a pure presentation-layer debounce with no Node-side tests; the debounce kinetics depend on browser SSE timing and are best validated by running a back-to-back batch sequence locally against the dashboard.

Validation

Gate Result
npm run typecheck ✅ pass
npm run lint ✅ 286 warnings / 671 infos — identical to main
npm run format:check ✅ pass
process-registry.test.ts ✅ 42 / 42 pass (4 new)
Full test suite ✅ 3,708 / 3,709 pass, 1 skipped — zero new failures
taskplane help / doctor ✅ pass

Back-compat notes

  • Existing batches with pre-fix merge-N.json files still load correctly via the permissive filename filter and the legacy key-fallback in the dashboard loader (snapshots[mergeNumber] if data.waveIndex == null). No migration needed.
  • The two functions whose signatures changed (writeMergeSnapshot, readMergeSnapshot, runtimeMergeSnapshotPath) are internal — no public-API contract drift.

Recommended local verification

After merge, the easiest end-to-end check is a polyrepo batch where the same lane numbers appear across multiple waves. Pre-fix, the dashboard's merge column will show —— for at least one wave; post-fix, every wave gets its own telemetry row. For #507, start two live batches back-to-back and confirm the dashboard never flashes the history panel between them.

…bounce

Two paired dashboard fixes, landed together because both are
small-surface-area observability issues in the same UI layer.

#509 \u2014 merge agent telemetry missing for some waves

Root cause: writeMergeSnapshot keyed the on-disk filename by mergeNumber
alone (mergeNumber == lane.laneNumber). Lane numbers reset every wave,
so wave N+1's lane-1 merge silently overwrote wave N's lane-1 terminal
snapshot before the dashboard could observe it. The user-visible
symptom was '\u2014' in the merge telemetry column for any wave whose lane
numbers were reused by a subsequent wave.

The fix is per-wave filename namespacing in runtimeMergeSnapshotPath:

  Before: .pi/runtime/{batchId}/lanes/merge-{mergeNumber}.json
  After:  .pi/runtime/{batchId}/lanes/merge-w{waveIndex}-{mergeNumber}.json

writeMergeSnapshot / readMergeSnapshot signatures gain waveIndex.
All four call sites in merge.ts already had waveIndex in scope (it's
a parameter of spawnMergeAgentV2) and pass it through as 'waveIndex ?? 0',
matching the existing nullish-coalesce pattern used to populate the
snapshot's own waveIndex field on lines 894-895.

Dashboard server's loadRuntimeMergeSnapshots in dashboard/server.cjs
correspondingly switches its intermediate map key from mergeNumber alone
to 'w{waveIndex}-{mergeNumber}', because keying by mergeNumber alone
reproduced the same collision at read time (multiple wave snapshots
read off disk but only the last-iterated kept in memory). Filename
filter remains permissive ('merge-*.json') so legacy snapshots from
pre-fix batches still load and key as plain '{mergeNumber}' for
back-compat.

#507 \u2014 dashboard briefly flips to history view at batch startup

Root cause: a single missed SSE poll during batch-state.json write at
startup triggered the no-batch handler in app.js, which closes the
viewer and renders the history panel. The next poll then picked up
the new batch and flipped back to live view, but the user perceived
a 'flash of history' between two live batches.

The fix is a 3-consecutive-miss debounce on the no-batch transition
in dashboard/public/app.js. With the server's 2s POLL_INTERVAL, the
threshold corresponds to ~6s of confirmed batch-absence \u2014 well past
the sub-second batch-state.json write window while still cleaning up
promptly when a batch genuinely ends. The miss counter resets the
moment a batch reappears, so back-to-back batches no longer flash
the history panel.

Tests

Adds 4 regression tests to process-registry.test.ts (7.4 through 7.7)
covering the #509 invariants:

  7.4 \u2014 writeMergeSnapshot produces a filename containing the waveIndex
  7.5 \u2014 same mergeNumber across two waves writes to distinct files
  7.6 \u2014 wave-1 write does not overwrite wave-0 with same mergeNumber
         (exact failure mode from the issue body)
  7.7 \u2014 readMergeSnapshot returns null for absent (wave,mergeNumber)
         tuples (no accidental cross-wave fallback)

#507 is a pure presentation-layer debounce in app.js with no Node-side
tests \u2014 the debounce kinetics depend on browser SSE timing and are
best validated by running a back-to-back batch sequence locally
against the dashboard.

Validation

  npm run typecheck            pass
  npm run lint                 286 warnings / 671 infos (identical to main)
  npm run format:check         pass
  process-registry.test.ts     42/42 pass (4 new)
  Full test suite              3708/3709 pass, 1 skipped (zero new failures)
  taskplane help / doctor      pass

Closes #507
Closes #509
@HenryLach HenryLach enabled auto-merge May 25, 2026 18:54
@HenryLach HenryLach merged commit 482e3af into main May 25, 2026
1 check passed
@HenryLach HenryLach deleted the fix/507-509-dashboard-pair branch May 25, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dashboard: merge agent telemetry missing for some waves (snapshot not captured) Dashboard: briefly flips to history view during batch startup

1 participant