fix: eliminate stale file race condition and fix error misclassification#611
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughSwitched FileTime.read to prefer filesystem mtime over wall-clock, added telemetry events for filetime drift and assert outcomes, introduced a new telemetry error_class Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant FileTime
participant Filesystem
participant Telemetry
Caller->>FileTime: read(sessionID, filepath)
FileTime->>Filesystem: stat(filepath)
Filesystem-->>FileTime: { mtime }
FileTime->>FileTime: compute driftMs = |now - mtime|
alt driftMs > 10
FileTime->>Telemetry: track(type: "filetime_drift", drift_ms, mtime_ahead, ...)
Telemetry-->>FileTime: ack (swallowed errors)
end
FileTime-->>Caller: record last-read timestamp (mtime or wall-clock)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Use file's actual `mtime` instead of `new Date()` in `FileTime.read()` to eliminate clock drift between Node.js and filesystem (WSL, networked drives) - Increase staleness tolerance from 50ms to 2s (p50 gap was 651ms, caused 782-retry loops on WSL) - Split `file_stale` out of `validation` error class for cleaner triage - Move `http_error` pattern before `validation` and add `HTTP 4xx` keywords to prevent WebFetch 404s from being misclassified as validation errors - Update tests to use `fs.utimes()` for deterministic mtime manipulation Closes #610 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
6281583 to
d0094a1
Compare
The keyword was removed in #611 to prevent HTTP 404 messages from being misclassified as validation errors. However, the pattern reordering (http_error now comes before validation) already handles that — HTTP 404 messages match `http_error` first via `"http 404"` keyword. Removing `"does not exist"` from validation caused SQL errors like `"column foo does not exist"` to fall through to `unknown`. Restore the keyword and add a test for the SQL case. Caught by multi-model code review (Claude + Gemini 3.1 Pro consensus). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The keyword was removed in #611 to prevent HTTP 404 messages from being misclassified as validation errors. However, the pattern reordering (http_error now comes before validation) already handles that — HTTP 404 messages match `http_error` first via `"http 404"` keyword. Removing `"does not exist"` from validation caused SQL errors like `"column foo does not exist"` to fall through to `unknown`. Restore the keyword and add a test for the SQL case. Caught by multi-model code review (Claude + Gemini 3.1 Pro consensus). Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
What does this PR do?
Fixes three telemetry issues discovered via AppInsights analysis (4,500 validation errors over 2 days):
Stale file race condition (54% of errors):
FileTime.read()usednew Date()instead of the file's actualmtime, causing clock drift on WSL/networked drives. One user hit 782 retry loops on a single file. Now usesFilesystem.stat(file).mtimeand increases tolerance from 50ms to 2s.HTTP 404 misclassified as validation (13%): The keyword
"does not exist"in thevalidationpattern matched WebFetch 404 messages. Movedhttp_errorpattern beforevalidationand addedHTTP 4xxstatus keywords.File staleness mixed into validation bucket (32%): Split
file_staleout as a new error class for"must read file","has been modified since","before overwriting"errors, enabling cleaner triage dashboards.Type of change
Issue for this PR
Closes #610
How did you verify your code works?
time.test.ts,telemetry.test.ts,write.test.ts,edit.test.ts)--markers --base main --strict)altimate-core-internal— all bugs are in the TS layerChecklist
🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
New Features
Tests