fix(#507,#509): dashboard merge-snapshot collision + no-batch poll debounce#596
Merged
Conversation
…bounce Two paired dashboard fixes, landed together because both are small-surface-area observability issues in the same UI layer. #509 \u2014 merge agent telemetry missing for some waves Root cause: writeMergeSnapshot keyed the on-disk filename by mergeNumber alone (mergeNumber == lane.laneNumber). Lane numbers reset every wave, so wave N+1's lane-1 merge silently overwrote wave N's lane-1 terminal snapshot before the dashboard could observe it. The user-visible symptom was '\u2014' in the merge telemetry column for any wave whose lane numbers were reused by a subsequent wave. The fix is per-wave filename namespacing in runtimeMergeSnapshotPath: Before: .pi/runtime/{batchId}/lanes/merge-{mergeNumber}.json After: .pi/runtime/{batchId}/lanes/merge-w{waveIndex}-{mergeNumber}.json writeMergeSnapshot / readMergeSnapshot signatures gain waveIndex. All four call sites in merge.ts already had waveIndex in scope (it's a parameter of spawnMergeAgentV2) and pass it through as 'waveIndex ?? 0', matching the existing nullish-coalesce pattern used to populate the snapshot's own waveIndex field on lines 894-895. Dashboard server's loadRuntimeMergeSnapshots in dashboard/server.cjs correspondingly switches its intermediate map key from mergeNumber alone to 'w{waveIndex}-{mergeNumber}', because keying by mergeNumber alone reproduced the same collision at read time (multiple wave snapshots read off disk but only the last-iterated kept in memory). Filename filter remains permissive ('merge-*.json') so legacy snapshots from pre-fix batches still load and key as plain '{mergeNumber}' for back-compat. #507 \u2014 dashboard briefly flips to history view at batch startup Root cause: a single missed SSE poll during batch-state.json write at startup triggered the no-batch handler in app.js, which closes the viewer and renders the history panel. The next poll then picked up the new batch and flipped back to live view, but the user perceived a 'flash of history' between two live batches. The fix is a 3-consecutive-miss debounce on the no-batch transition in dashboard/public/app.js. With the server's 2s POLL_INTERVAL, the threshold corresponds to ~6s of confirmed batch-absence \u2014 well past the sub-second batch-state.json write window while still cleaning up promptly when a batch genuinely ends. The miss counter resets the moment a batch reappears, so back-to-back batches no longer flash the history panel. Tests Adds 4 regression tests to process-registry.test.ts (7.4 through 7.7) covering the #509 invariants: 7.4 \u2014 writeMergeSnapshot produces a filename containing the waveIndex 7.5 \u2014 same mergeNumber across two waves writes to distinct files 7.6 \u2014 wave-1 write does not overwrite wave-0 with same mergeNumber (exact failure mode from the issue body) 7.7 \u2014 readMergeSnapshot returns null for absent (wave,mergeNumber) tuples (no accidental cross-wave fallback) #507 is a pure presentation-layer debounce in app.js with no Node-side tests \u2014 the debounce kinetics depend on browser SSE timing and are best validated by running a back-to-back batch sequence locally against the dashboard. Validation npm run typecheck pass npm run lint 286 warnings / 671 infos (identical to main) npm run format:check pass process-registry.test.ts 42/42 pass (4 new) Full test suite 3708/3709 pass, 1 skipped (zero new failures) taskplane help / doctor pass Closes #507 Closes #509
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #507. Closes #509.
Two paired dashboard fixes, landed together because both are small-surface-area observability issues in the same UI layer.
#509 — merge agent telemetry missing for some waves
Root cause
writeMergeSnapshotkeyed the on-disk filename bymergeNumberalone, andmergeNumberis derived fromlane.laneNumber(seemerge.ts:1833andmerge.ts:855-864). Lane numbers reset every wave, so wave N+1's lane-1 merge silently overwrote wave N's lane-1 terminal snapshot before the dashboard could read it.This exactly matches the symptom in the issue: dashboard renders telemetry for some waves and
——for others, with the pattern correlating to which waves' lane numbers were reused.The fix
Per-wave filename namespacing in
runtimeMergeSnapshotPath:writeMergeSnapshot/readMergeSnapshotsignatures gainwaveIndex. All four call sites inmerge.tsalready hadwaveIndexin scope (parameter ofspawnMergeAgentV2) and pass it through aswaveIndex ?? 0, matching the existing nullish-coalesce used to populate the snapshot's ownwaveIndexfield.loadRuntimeMergeSnapshotsindashboard/server.cjscorrespondingly switches its intermediate map key frommergeNumberalone tow{waveIndex}-{mergeNumber}, because keying bymergeNumberalone reproduced the same collision at read time (the snapshot files would all exist on disk under the new pattern, but the loader'ssnapshots[data.mergeNumber] = datawould still squash multiple wave entries down to one). Filename filter remains permissive (merge-*.json) so legacy snapshots from pre-fix batches still load and key as plain{mergeNumber}for back-compat.#507 — dashboard briefly flips to history view at batch startup
Root cause
A single missed SSE poll during
batch-state.jsonwrite at startup triggered the no-batch handler inapp.js, which closes the viewer and renders the history panel. The next poll picked up the new batch and flipped back to live view, but the user perceived a 'flash of history' between two live batches.The fix
3-consecutive-miss debounce on the no-batch transition in
app.js. With the server's 2sPOLL_INTERVAL, the threshold corresponds to ~6s of confirmed batch-absence — well past the sub-secondbatch-state.jsonwrite window while still cleaning up promptly when a batch genuinely ends. The miss counter resets the moment a batch reappears.Tests
Adds 4 regression tests to
process-registry.test.ts(7.4 through 7.7) covering the #509 invariants:writeMergeSnapshotproduces a filename containing thewaveIndexmergeNumberacross two waves writes to distinct filesmergeNumber(the exact failure mode from the issue body)readMergeSnapshotreturnsnullfor absent (wave, mergeNumber) tuples — no accidental cross-wave fallback#507 is a pure presentation-layer debounce with no Node-side tests; the debounce kinetics depend on browser SSE timing and are best validated by running a back-to-back batch sequence locally against the dashboard.
Validation
npm run typechecknpm run lintmainnpm run format:checkprocess-registry.test.tstaskplane help/doctorBack-compat notes
merge-N.jsonfiles still load correctly via the permissive filename filter and the legacy key-fallback in the dashboard loader (snapshots[mergeNumber]ifdata.waveIndex == null). No migration needed.writeMergeSnapshot,readMergeSnapshot,runtimeMergeSnapshotPath) are internal — no public-API contract drift.Recommended local verification
After merge, the easiest end-to-end check is a polyrepo batch where the same lane numbers appear across multiple waves. Pre-fix, the dashboard's merge column will show
——for at least one wave; post-fix, every wave gets its own telemetry row. For #507, start two live batches back-to-back and confirm the dashboard never flashes the history panel between them.