Skip to content

feat(measure): make tool-call warning thresholds env-var configurable#33

Closed
CodyLee117 wants to merge 1 commit into
alexgreensh:mainfrom
CodyLee117:feat-tool-call-thresholds-envvar
Closed

feat(measure): make tool-call warning thresholds env-var configurable#33
CodyLee117 wants to merge 1 commit into
alexgreensh:mainfrom
CodyLee117:feat-tool-call-thresholds-envvar

Conversation

@CodyLee117
Copy link
Copy Markdown

Summary

Adds TOKEN_OPTIMIZER_TOOL_CALL_WARN / TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL env-var overrides for the cumulative-tool-call warning thresholds at skills/token-optimizer/scripts/measure.py (_TOOL_CALL_WARN_THRESHOLDS, around line 11419).

Defaults are unchanged from upstream (25 WARNING / 40 CRITICAL). No behavioral change for any user who doesn't set the env vars — this is opt-in only.

Rationale

The 25 / 40 thresholds (introduced in v5.6.7, commit 1185b3c6) derive from the COLM 2025 / codeongrass.com practitioner analysis on instruction-adherence degradation. That research predates Claude Opus 4.7's 1M-context tier — on those longer-context runs, sessions routinely cross 40 cumulative tool calls long before adherence actually starts degrading, so the CRITICAL warning fires too aggressively for that workload.

Concretely: a single Opus 4.7 1M session implementing a multi-slice feature (schema records → state integration → action handlers → tests → docs → commit + push) can land 80-120 tool calls while staying highly coherent. The current code wedges those sessions into a persistent CRITICAL state from call ~40 onward, with no in-product way to dismiss or tune.

Making the constants configurable lets:

  • Long-context users tune higher (50/80, or 100/200 for deep research) without source edits or local patches.
  • Shorter-context / non-Anthropic users keep the existing defaults, which still match the practitioner research for those workloads.
  • Future research updates can ship as new default values without forcing every long-context user to re-patch.

Internal consistency

The pattern mirrors how the same file already exposes other knobs as env vars at lines 11425-11437:

```python
_CHECKPOINT_MAX_FILES = int(os.environ.get("TOKEN_OPTIMIZER_CHECKPOINT_FILES", "10"))
_CHECKPOINT_TTL_SECONDS = int(os.environ.get("TOKEN_OPTIMIZER_CHECKPOINT_TTL", "300"))
_CHECKPOINT_RETENTION_DAYS = int(os.environ.get("TOKEN_OPTIMIZER_CHECKPOINT_RETENTION_DAYS", "7"))
_RELEVANCE_THRESHOLD = float(os.environ.get("TOKEN_OPTIMIZER_RELEVANCE_THRESHOLD", "0.3"))
_EDIT_BATCH_WRITE_THRESHOLD = int(os.environ.get("TOKEN_OPTIMIZER_EDIT_BATCH_WRITE_THRESHOLD", "4"))
```

The override naming (TOKEN_OPTIMIZER_TOOL_CALL_WARN / _CRITICAL) matches the existing TOKEN_OPTIMIZER_* prefix convention.

Change

 # Tool call thresholds: instruction adherence degrades after ~15 tool calls
 # (COLM 2025, codeongrass.com practitioner analysis). Cumulative, not reset on compact.
+# Configurable via TOKEN_OPTIMIZER_TOOL_CALL_WARN / _CRITICAL env vars to suit
+# longer-context models (e.g. Claude Opus 4.7 1M). Defaults are unchanged from the
+# original literal — opt-in override only.
+_TOOL_CALL_WARN     = int(os.environ.get(\"TOKEN_OPTIMIZER_TOOL_CALL_WARN\", \"25\"))
+_TOOL_CALL_CRITICAL = int(os.environ.get(\"TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL\", \"40\"))
 _TOOL_CALL_WARN_THRESHOLDS = [
-    (40, \"CRITICAL\", \"40+ tool calls, instruction adherence severely degraded\"),
-    (25, \"WARNING\", \"25+ tool calls, consider a fresh session\"),
+    (_TOOL_CALL_CRITICAL, \"CRITICAL\", f\"{_TOOL_CALL_CRITICAL}+ tool calls, instruction adherence severely degraded\"),
+    (_TOOL_CALL_WARN,     \"WARNING\",  f\"{_TOOL_CALL_WARN}+ tool calls, consider a fresh session\"),
 ]

The displayed count text is f-string-derived so it stays in sync with whatever threshold the user configures (e.g. "150+ tool calls" rather than the prior hardcoded "40+" when a user has set TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL=150).

Test plan

  • AST parse-check on patched measure.py (python3 -c \"import ast; ast.parse(...)\") — passes.
  • Defaults preserved: with neither env var set, _TOOL_CALL_WARN_THRESHOLDS evaluates to [(40, \"CRITICAL\", \"40+ tool calls, …\"), (25, \"WARNING\", \"25+ tool calls, …\")] — identical to upstream.
  • Overrides honored: with TOKEN_OPTIMIZER_TOOL_CALL_WARN=50 / TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL=80 in the session env, the warning text correctly reports "50+" / "80+" and triggers at those counts.

Diff is +7/-2 lines, single file. Happy to add a README entry under a centralized "Environment variables" section if you'd like — let me know your preference and I'll fold it in.

🤖 Generated with Claude Code

Adds TOKEN_OPTIMIZER_TOOL_CALL_WARN / TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL
overrides for the cumulative-tool-call warning thresholds at measure.py
(_TOOL_CALL_WARN_THRESHOLDS). Defaults match upstream (25 WARNING / 40
CRITICAL) — no behavioral change for users who don't set the env vars.

Rationale: the 25/40 thresholds derive from the COLM 2025 / codeongrass
practitioner analysis on instruction-adherence degradation, which
predates Claude Opus 4.7 1M-context sessions. On those longer-context
runs, sessions routinely cross 40 cumulative tool calls before
adherence actually degrades, so the CRITICAL nag fires too aggressively
for that workload. Making the values configurable lets long-context
users tune higher (e.g. 50 / 80, or 100 / 200 for deep research
sessions) without source edits, while shorter-context users keep the
existing defaults.

The pattern mirrors how the file already exposes other knobs as env
vars (_CHECKPOINT_MAX_FILES, _CHECKPOINT_TTL_SECONDS,
_RELEVANCE_THRESHOLD, _EDIT_BATCH_*, etc.) at lines 11425-11437 of the
same file, so the override mechanism is internally consistent.

Diff is 7 lines net (+7/-2): a comment, two env-var reads, and an
f-string-ified version of the existing message text so the displayed
count stays in sync with whatever threshold the user configures (e.g.
'150+ tool calls' instead of 'CRITICAL fired at 150 but message still
says 40+').
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 15, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@CodyLee117
Copy link
Copy Markdown
Author

I have read the CLA Document and I hereby sign the CLA

github-actions Bot added a commit that referenced this pull request May 15, 2026
alexgreensh added a commit that referenced this pull request May 17, 2026
….6.10

Codex hooks now install to ~/.codex/hooks.json (global) by default instead
of per-project .codex/hooks.json. One install covers all projects. --project
flag remains available for per-project overrides.

Surgical runtime guards in run_ensure_health prevent Claude Code settings
writes (~/.claude/settings.json) when running under Codex, while preserving
Codex-safe operations (dashboard, daemon, checkpoints).

Also includes: env-var configurable tool-call thresholds (PR #33 by
CodyLee117), codex_doctor global hooks check with timeout unit validation,
install.sh sparse checkout fix for .codex-plugin/, and version alignment
across all four manifests.
@alexgreensh
Copy link
Copy Markdown
Owner

Applied in v5.6.10 (commit db64beb) with credit in the release notes. Clean change, followed existing patterns perfectly. Thanks @CodyLee117 for the contribution! The safe env-var parsing helpers we added alongside your change also protect the existing TOKEN_OPTIMIZER_* configs from ValueError crashes on malformed input.

@github-actions github-actions Bot locked and limited conversation to collaborators May 17, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants