feat(measure): make tool-call warning thresholds env-var configurable#33
Closed
CodyLee117 wants to merge 1 commit into
Closed
feat(measure): make tool-call warning thresholds env-var configurable#33CodyLee117 wants to merge 1 commit into
CodyLee117 wants to merge 1 commit into
Conversation
Adds TOKEN_OPTIMIZER_TOOL_CALL_WARN / TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL overrides for the cumulative-tool-call warning thresholds at measure.py (_TOOL_CALL_WARN_THRESHOLDS). Defaults match upstream (25 WARNING / 40 CRITICAL) — no behavioral change for users who don't set the env vars. Rationale: the 25/40 thresholds derive from the COLM 2025 / codeongrass practitioner analysis on instruction-adherence degradation, which predates Claude Opus 4.7 1M-context sessions. On those longer-context runs, sessions routinely cross 40 cumulative tool calls before adherence actually degrades, so the CRITICAL nag fires too aggressively for that workload. Making the values configurable lets long-context users tune higher (e.g. 50 / 80, or 100 / 200 for deep research sessions) without source edits, while shorter-context users keep the existing defaults. The pattern mirrors how the file already exposes other knobs as env vars (_CHECKPOINT_MAX_FILES, _CHECKPOINT_TTL_SECONDS, _RELEVANCE_THRESHOLD, _EDIT_BATCH_*, etc.) at lines 11425-11437 of the same file, so the override mechanism is internally consistent. Diff is 7 lines net (+7/-2): a comment, two env-var reads, and an f-string-ified version of the existing message text so the displayed count stays in sync with whatever threshold the user configures (e.g. '150+ tool calls' instead of 'CRITICAL fired at 150 but message still says 40+').
Contributor
|
All contributors have signed the CLA ✍️ ✅ |
Author
|
I have read the CLA Document and I hereby sign the CLA |
alexgreensh
added a commit
that referenced
this pull request
May 17, 2026
….6.10 Codex hooks now install to ~/.codex/hooks.json (global) by default instead of per-project .codex/hooks.json. One install covers all projects. --project flag remains available for per-project overrides. Surgical runtime guards in run_ensure_health prevent Claude Code settings writes (~/.claude/settings.json) when running under Codex, while preserving Codex-safe operations (dashboard, daemon, checkpoints). Also includes: env-var configurable tool-call thresholds (PR #33 by CodyLee117), codex_doctor global hooks check with timeout unit validation, install.sh sparse checkout fix for .codex-plugin/, and version alignment across all four manifests.
Owner
|
Applied in v5.6.10 (commit db64beb) with credit in the release notes. Clean change, followed existing patterns perfectly. Thanks @CodyLee117 for the contribution! The safe env-var parsing helpers we added alongside your change also protect the existing TOKEN_OPTIMIZER_* configs from ValueError crashes on malformed input. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
TOKEN_OPTIMIZER_TOOL_CALL_WARN/TOKEN_OPTIMIZER_TOOL_CALL_CRITICALenv-var overrides for the cumulative-tool-call warning thresholds atskills/token-optimizer/scripts/measure.py(_TOOL_CALL_WARN_THRESHOLDS, around line 11419).Defaults are unchanged from upstream (25 WARNING / 40 CRITICAL). No behavioral change for any user who doesn't set the env vars — this is opt-in only.
Rationale
The 25 / 40 thresholds (introduced in v5.6.7, commit
1185b3c6) derive from the COLM 2025 / codeongrass.com practitioner analysis on instruction-adherence degradation. That research predates Claude Opus 4.7's 1M-context tier — on those longer-context runs, sessions routinely cross 40 cumulative tool calls long before adherence actually starts degrading, so theCRITICALwarning fires too aggressively for that workload.Concretely: a single Opus 4.7 1M session implementing a multi-slice feature (schema records → state integration → action handlers → tests → docs → commit + push) can land 80-120 tool calls while staying highly coherent. The current code wedges those sessions into a persistent CRITICAL state from call ~40 onward, with no in-product way to dismiss or tune.
Making the constants configurable lets:
Internal consistency
The pattern mirrors how the same file already exposes other knobs as env vars at lines 11425-11437:
```python
_CHECKPOINT_MAX_FILES = int(os.environ.get("TOKEN_OPTIMIZER_CHECKPOINT_FILES", "10"))
_CHECKPOINT_TTL_SECONDS = int(os.environ.get("TOKEN_OPTIMIZER_CHECKPOINT_TTL", "300"))
_CHECKPOINT_RETENTION_DAYS = int(os.environ.get("TOKEN_OPTIMIZER_CHECKPOINT_RETENTION_DAYS", "7"))
_RELEVANCE_THRESHOLD = float(os.environ.get("TOKEN_OPTIMIZER_RELEVANCE_THRESHOLD", "0.3"))
_EDIT_BATCH_WRITE_THRESHOLD = int(os.environ.get("TOKEN_OPTIMIZER_EDIT_BATCH_WRITE_THRESHOLD", "4"))
```
The override naming (
TOKEN_OPTIMIZER_TOOL_CALL_WARN/_CRITICAL) matches the existingTOKEN_OPTIMIZER_*prefix convention.Change
The displayed count text is f-string-derived so it stays in sync with whatever threshold the user configures (e.g. "150+ tool calls" rather than the prior hardcoded "40+" when a user has set
TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL=150).Test plan
measure.py(python3 -c \"import ast; ast.parse(...)\") — passes._TOOL_CALL_WARN_THRESHOLDSevaluates to[(40, \"CRITICAL\", \"40+ tool calls, …\"), (25, \"WARNING\", \"25+ tool calls, …\")]— identical to upstream.TOKEN_OPTIMIZER_TOOL_CALL_WARN=50/TOKEN_OPTIMIZER_TOOL_CALL_CRITICAL=80in the session env, the warning text correctly reports "50+" / "80+" and triggers at those counts.Diff is +7/-2 lines, single file. Happy to add a README entry under a centralized "Environment variables" section if you'd like — let me know your preference and I'll fold it in.
🤖 Generated with Claude Code