fix: telemetry improvements from deep AppInsights analysis by anandgupta42 · Pull Request #587 · AltimateAI/altimate-code

anandgupta42 · 2026-03-30T17:58:43Z

What does this PR do?

Fixes telemetry gaps identified through deep analysis of altimate-code-os Azure AppInsights data (10-day window, 3,678 events, 8 machines):

Error classification — adds 4 new error classes (file_not_found, edit_mismatch, not_configured, resource_exhausted) to reduce "unknown" from 85%+ to ~50%
Session metadata — adds os, arch, node_version to session_start event for environment segmentation
Doom loop protection — adds per-tool call counter (threshold=30) to catch varied-input loops like todowrite 2,080x
Token visibility — adds tokens_input_total field for Anthropic where tokens_input excludes cached tokens
Query documentation — adds KQL reference documenting customDimensions vs customMeasurements

Type of change

Bug fix (non-breaking change which fixes an issue)

Issue for this PR

Closes #586

How did you verify your code works?

189 telemetry tests pass (0 failures, 620 assertions)
TypeScript typecheck passes (5/5 packages via turbo)
Upstream marker guard passes
Multi-model code review (5 models: Claude, Gemini 3.1 Pro, Kimi K2.5, MiniMax M2.5, GLM-5) — all actionable findings addressed
Verified token data in live AppInsights via az monitor app-insights query

Checklist

My code follows the project's coding standards
I have added tests that prove my fix is effective
New and existing unit tests pass locally
I have reviewed my own code
I have run the linters and fixed any issues

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added detection for repetitive tool execution with user alerts when repeat threshold is reached
- Enhanced token usage tracking to include cached input tokens in total counts
Improvements
- Expanded system environment telemetry collection with OS, architecture, and runtime information
- Refined error classification for better categorization

Based on 10-day telemetry analysis of altimate-code-os: Error classification (P0): - Add 4 new error classes: `file_not_found`, `edit_mismatch`, `not_configured`, `resource_exhausted` - Move warehouse/driver keywords from `connection` to `not_configured` - Reduces "unknown" error classification from 85%+ to ~50% Session metadata (P0): - Add `os`, `arch`, `node_version` to `session_start` event - Enables environment-based segmentation in dashboards Doom loop detection (P1): - Add per-tool call counter (threshold=30) to catch varied-input loops - Emits `doom_loop_detected` telemetry event when triggered - Addresses todowrite tool called 2,080x by one user Token visibility (P1): - Add `tokens_input_total` field to generation events - Includes cached tokens for Anthropic (where `tokens_input` excludes cache) - Only emitted when it differs from `tokens_input` Telemetry query docs (P2): - Add KQL reference documenting `customDimensions` vs `customMeasurements` - Prevents analysts from querying the wrong column Cleanup: - Rename `telemetry-moat-signals.test.ts` → `telemetry-signals.test.ts` - Remove "moat" terminology from test comments Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai · 2026-03-30T17:58:57Z

📝 Walkthrough

Walkthrough

This PR extends telemetry event schemas with environment metadata (OS, architecture, Node version), improves error classification with four new error categories, introduces per-tool call counting to detect doom loops, and adds total input token accounting that includes cached tokens.

Changes

Cohort / File(s)	Summary
Telemetry Schema & Error Classification `packages/opencode/src/altimate/telemetry/index.ts`	Extended `Telemetry.Event` schema: added `os`, `arch`, `node_version` to `session_start`; added optional `tokens_input_total` to `generation` events; expanded `core_failure` `error_class` union with `file_not_found`, `edit_mismatch`, `not_configured`, `resource_exhausted`. Rewrote `ERROR_PATTERNS` to split error classifications and add dedicated keyword groups for each new error class. Added KQL documentation comment block.
Token Tracking `packages/opencode/src/session/index.ts`	Added `inputTotal` field to `Session.getUsage()` return value, computed as sum of adjusted input tokens plus cache read/write input tokens.
Doom Loop Detection `packages/opencode/src/session/processor.ts`	Introduced per-tool repeat call tracking via `toolCallCounts` map. When a tool's call count reaches threshold (30), emits `doom_loop_detected` telemetry event, issues `PermissionNext.ask` with `permission: "doom_loop"`, and resets counter. Extended generation telemetry to conditionally include `tokens_input_total` when different from `tokens_input`.
Session Initialization `packages/opencode/src/session/prompt.ts`	Added environment metadata (`os: process.platform`, `arch: process.arch`, `node_version: process.version`) to `session_start` telemetry event in `SessionPrompt.loop` at step 1.
Test Updates `packages/opencode/test/altimate/telemetry-signals.test.ts`, `packages/opencode/test/telemetry/telemetry.test.ts`	Updated test payloads to include `os`, `arch`, `node_version` in session creation and event tracking. Reorganized error classification tests: moved patterns from "connection" group into dedicated `not_configured`, `file_not_found`, `edit_mismatch`, `resource_exhausted` test cases. Removed "moat" terminology from test descriptions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

#445 — Extends Telemetry.Event schema with new event variants and fields in the telemetry subsystem.
#336 — Modifies packages/opencode/src/session/processor.ts to adjust token accounting logic and per-generation telemetry.
#245 — Updates core_failure error classification patterns and error_class union in telemetry schema.

Suggested labels

contributor

Suggested reviewers

mdesmet
kulvirgit

Poem

🐰 Telemetry now tracks where we hop and bound,
With OS, arch, and Node all neatly bound,
Doom loops detected before they spiral deep,
And token counts tallied—cached tokens don't sleep! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	Title clearly identifies the main change (telemetry improvements) and references the analysis source (AppInsights), directly aligning with the primary purpose of the PR.
Linked Issues check	✅ Passed	All coding objectives from `#586` are met: error classification extended with 4 new classes (`#586`), session metadata added (os/arch/node_version `#586`), doom loop protection implemented with counter/threshold (`#586`), tokens_input_total field added (`#586`), and moat terminology removed (`#586`).
Out of Scope Changes check	✅ Passed	All changes directly support the linked issue objectives: telemetry schema/classification updates, session metadata collection, doom loop detection, token tracking improvements, and test file renaming—all within scope of fixing identified telemetry gaps.
Description check	✅ Passed	The PR description comprehensively covers the required template sections with detailed explanations of changes, verification methods, and complete checklist.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/telemetry-analysis-improvements

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Based on 10-day telemetry analysis of altimate-code-os: Error classification (P0): - Add 4 new error classes: `file_not_found`, `edit_mismatch`, `not_configured`, `resource_exhausted` - Move warehouse/driver keywords from `connection` to `not_configured` - Reduces "unknown" error classification from 85%+ to ~50% Session metadata (P0): - Add `os`, `arch`, `node_version` to `session_start` event - Enables environment-based segmentation in dashboards Doom loop detection (P1): - Add per-tool call counter (threshold=30) to catch varied-input loops - Emits `doom_loop_detected` telemetry event when triggered - Addresses todowrite tool called 2,080x by one user Token visibility (P1): - Add `tokens_input_total` field to generation events - Includes cached tokens for Anthropic (where `tokens_input` excludes cache) - Only emitted when it differs from `tokens_input` Telemetry query docs (P2): - Add KQL reference documenting `customDimensions` vs `customMeasurements` - Prevents analysts from querying the wrong column Cleanup: - Rename `telemetry-moat-signals.test.ts` → `telemetry-signals.test.ts` - Remove "moat" terminology from test comments Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

claude bot reviewed Mar 30, 2026

View reviewed changes

github-actions bot added the contributor label Mar 30, 2026

anandgupta42 merged commit 75b077f into main Mar 30, 2026
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: telemetry improvements from deep AppInsights analysis#587

fix: telemetry improvements from deep AppInsights analysis#587
anandgupta42 merged 1 commit intomainfrom
fix/telemetry-analysis-improvements

anandgupta42 commented Mar 30, 2026 •

edited

Loading

Uh oh!

claude bot left a comment

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anandgupta42 commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Issue for this PR

How did you verify your code works?

Checklist

Summary by CodeRabbit

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

anandgupta42 commented Mar 30, 2026 •

edited

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading