fix: telemetry improvements from deep AppInsights analysis#587
fix: telemetry improvements from deep AppInsights analysis#587anandgupta42 merged 1 commit intomainfrom
Conversation
Based on 10-day telemetry analysis of altimate-code-os: Error classification (P0): - Add 4 new error classes: `file_not_found`, `edit_mismatch`, `not_configured`, `resource_exhausted` - Move warehouse/driver keywords from `connection` to `not_configured` - Reduces "unknown" error classification from 85%+ to ~50% Session metadata (P0): - Add `os`, `arch`, `node_version` to `session_start` event - Enables environment-based segmentation in dashboards Doom loop detection (P1): - Add per-tool call counter (threshold=30) to catch varied-input loops - Emits `doom_loop_detected` telemetry event when triggered - Addresses todowrite tool called 2,080x by one user Token visibility (P1): - Add `tokens_input_total` field to generation events - Includes cached tokens for Anthropic (where `tokens_input` excludes cache) - Only emitted when it differs from `tokens_input` Telemetry query docs (P2): - Add KQL reference documenting `customDimensions` vs `customMeasurements` - Prevents analysts from querying the wrong column Cleanup: - Rename `telemetry-moat-signals.test.ts` → `telemetry-signals.test.ts` - Remove "moat" terminology from test comments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
📝 WalkthroughWalkthroughThis PR extends telemetry event schemas with environment metadata (OS, architecture, Node version), improves error classification with four new error categories, introduces per-tool call counting to detect doom loops, and adds total input token accounting that includes cached tokens. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Based on 10-day telemetry analysis of altimate-code-os: Error classification (P0): - Add 4 new error classes: `file_not_found`, `edit_mismatch`, `not_configured`, `resource_exhausted` - Move warehouse/driver keywords from `connection` to `not_configured` - Reduces "unknown" error classification from 85%+ to ~50% Session metadata (P0): - Add `os`, `arch`, `node_version` to `session_start` event - Enables environment-based segmentation in dashboards Doom loop detection (P1): - Add per-tool call counter (threshold=30) to catch varied-input loops - Emits `doom_loop_detected` telemetry event when triggered - Addresses todowrite tool called 2,080x by one user Token visibility (P1): - Add `tokens_input_total` field to generation events - Includes cached tokens for Anthropic (where `tokens_input` excludes cache) - Only emitted when it differs from `tokens_input` Telemetry query docs (P2): - Add KQL reference documenting `customDimensions` vs `customMeasurements` - Prevents analysts from querying the wrong column Cleanup: - Rename `telemetry-moat-signals.test.ts` → `telemetry-signals.test.ts` - Remove "moat" terminology from test comments Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
What does this PR do?
Fixes telemetry gaps identified through deep analysis of altimate-code-os Azure AppInsights data (10-day window, 3,678 events, 8 machines):
file_not_found,edit_mismatch,not_configured,resource_exhausted) to reduce "unknown" from 85%+ to ~50%os,arch,node_versiontosession_startevent for environment segmentationtokens_input_totalfield for Anthropic wheretokens_inputexcludes cached tokenscustomDimensionsvscustomMeasurementsType of change
Issue for this PR
Closes #586
How did you verify your code works?
az monitor app-insights queryChecklist
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Improvements