feat(mcp): per-symbol adaptive codegraph_explore sizing (#569)

colbymchenry · claude · web-flow · commit b026e64b413b · 2026-05-29T23:06:12.000-05:00
Sizes codegraph_explore to the answer, not the file count: shows the mechanism +
the exact methods you named in full (even buried in a large file) while collapsing
redundant interchangeable implementations to signatures. Adds uniqueness-aware
spare, per-symbol focused rendering of family files, all-tier test-file exclusion,
and named-method cluster survival in non-sibling god-files.

Validated A/B (Opus 4.8, 7-repo sweep): avg 25%% cheaper / 57%% fewer tokens / 23%%
faster / 62%% fewer tool calls. Django 9-&gt;23%% cheaper (0 reads), OkHttp 4-&gt;11%%
cheaper; gains across small/medium/large, inert repos unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -13,7 +13,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
 - `codegraph init` now builds the initial index by default — you no longer need the `-i`/`--index` flag (it's still accepted, so existing commands and scripts keep working). (#483)
 - Go: Gin middleware chains now connect end-to-end in `codegraph_trace` and `codegraph_explore` — following a request reaches the middleware and route handlers registered via `.Use()` / `.GET()` instead of dead-ending where the framework dispatches the chain dynamically.
-- `codegraph_explore` is now leaner on interface-heavy flows: when a query spans many interchangeable implementations of one interface (an HTTP interceptor chain, say), it shows the rest as signatures instead of every full body, while keeping the dispatch mechanism and any specific method you asked about in full. Fewer tokens for the same answer, so questions like these stop costing more than plain grep/read — in testing, the two slowest-to-pay-off repos (a Java and a Python framework) went from slightly costlier than native search to clearly cheaper. Distinct, non-interchangeable code is shown in full as before. Disable with `CODEGRAPH_ADAPTIVE_EXPLORE=0`.
+- `codegraph_explore` now sizes its response to the *answer* instead of the file count: it shows the mechanism and the exact methods you asked about in full — even when they're buried deep in a large file — while collapsing the redundant interchangeable implementations of an interface (an HTTP interceptor chain, a query-compiler family) down to signatures. Fewer tokens for a more complete answer, so on the flows that used to occasionally cost more than plain grep/read it's now clearly cheaper — and the win holds across small, medium, and large codebases. Distinct, non-interchangeable code is shown in full as before. Disable with `CODEGRAPH_ADAPTIVE_EXPLORE=0`.
 
 ### Fixes
 
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 
 ### Supercharge Claude Code, Cursor, Codex, OpenCode, Hermes Agent, Gemini, Antigravity, and Kiro with Semantic Code Intelligence
 
-**~22% cheaper · ~50% fewer tool calls · 100% local**
+**~25% cheaper · ~62% fewer tool calls · 100% local**
 
 ### [Documentation & Website →](https://colbymchenry.github.io/codegraph/)
 
@@ -83,101 +83,101 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil
 
 ### Benchmark Results
 
-Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with adaptive `codegraph_explore` sizing._
+Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with per-symbol adaptive `codegraph_explore` sizing._
 
-> **Average: 22% cheaper · 47% fewer tokens · 20% faster · 50% fewer tool calls**
+> **Average: 25% cheaper · 57% fewer tokens · 23% faster · 62% fewer tool calls**
 
 | Codebase | Language | Cost | Tokens | Time | Tool calls |
 |----------|----------|------|--------|------|------------|
-| **VS Code** | TypeScript · ~10k files | 13% cheaper | 63% fewer | 11% faster | 82% fewer |
-| **Excalidraw** | TypeScript · ~640 | 40% cheaper | 71% fewer | 51% faster | 82% fewer |
-| **Django** | Python · ~3k | 9% cheaper | 35% fewer | 7% faster | 38% fewer |
-| **Tokio** | Rust · ~790 | 31% cheaper | 59% fewer | 29% faster | 61% fewer |
-| **OkHttp** | Java · ~645 | 4% cheaper | 16% fewer | 11% faster | 40% fewer |
-| **Gin** | Go · ~110 | 28% cheaper | 40% fewer | 25% faster | 35% fewer |
-| **Alamofire** | Swift · ~110 | 32% cheaper | 43% fewer | 6% faster | 13% fewer |
+| **VS Code** | TypeScript · ~10k files | 33% cheaper | 70% fewer | 27% faster | 80% fewer |
+| **Excalidraw** | TypeScript · ~640 | 27% cheaper | 61% fewer | 26% faster | 70% fewer |
+| **Django** | Python · ~3k | 23% cheaper | 70% fewer | 28% faster | 77% fewer |
+| **Tokio** | Rust · ~790 | 35% cheaper | 70% fewer | 37% faster | 79% fewer |
+| **OkHttp** | Java · ~645 | 11% cheaper | 48% fewer | 26% faster | 70% fewer |
+| **Gin** | Go · ~110 | 15% cheaper | 35% fewer | 9% faster | 47% fewer |
+| **Alamofire** | Swift · ~110 | 28% cheaper | 46% fewer | 7% faster | 13% fewer |
 
-CodeGraph cuts **tool calls and total tokens on every repo** and answers large repos with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. **Every repo is now cheaper, not just faster** — the two former cost outliers (Django and OkHttp, where the answer spans many interchangeable implementations of one interface) flipped from *costlier* than native search to cheaper once adaptive `codegraph_explore` sizing stopped shipping every sibling's full body. The margin is still narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays positive across the board; the largest wins remain fewer tool calls and faster answers.
+CodeGraph cuts **cost, tokens, tool calls, and time on every repo** — across small, medium, and large codebases — and answers most of them with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. `codegraph_explore` shows the answer in full — the mechanism plus the exact methods you asked about, even when they're buried in a multi-thousand-line file — while collapsing redundant interchangeable implementations to signatures, so the response is sized to the *answer* rather than the file count. The cost margin is narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays solidly positive across the board.
 
 <details>
 <summary><strong>Per-repo breakdown — WITH vs WITHOUT (median of 4)</strong></summary>
 
 **VS Code** · ~10k files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 58s | 2m 13s | 11% faster |
-| File Reads | 0 | 8 | −8 |
-| Grep/Bash | 0 | 9 | −9 |
-| Tool calls | 3 | 17 | 82% fewer |
-| Total tokens | 607k | 1.65M | 63% fewer |
-| Cost | $0.66 | $0.76 | 13% cheaper |
+| Time | 1m 37s | 2m 13s | 27% faster |
+| File Reads | 0 | 9 | −9 |
+| Grep/Bash | 0 | 11 | −11 |
+| Tool calls | 4 | 21 | 80% fewer |
+| Total tokens | 545k | 1.79M | 70% fewer |
+| Cost | $0.55 | $0.83 | 33% cheaper |
 
 **Excalidraw** · ~640 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 23s | 2m 48s | 51% faster |
-| File Reads | 0 | 11 | −11 |
-| Grep/Bash | 0 | 9 | −9 |
-| Tool calls | 4 | 20 | 82% fewer |
-| Total tokens | 596k | 2.06M | 71% fewer |
-| Cost | $0.53 | $0.89 | 40% cheaper |
+| Time | 1m 34s | 2m 6s | 26% faster |
+| File Reads | 0 | 7 | −7 |
+| Grep/Bash | 0 | 8 | −8 |
+| Tool calls | 5 | 15 | 70% fewer |
+| Total tokens | 651k | 1.69M | 61% fewer |
+| Cost | $0.57 | $0.78 | 27% cheaper |
 
 **Django** · ~3k files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 43s | 1m 51s | 7% faster |
-| File Reads | 5 | 10 | −5 |
-| Grep/Bash | 0 | 4 | −4 |
-| Tool calls | 8 | 13 | 38% fewer |
-| Total tokens | 752k | 1.16M | 35% fewer |
-| Cost | $0.56 | $0.62 | 9% cheaper |
+| Time | 1m 25s | 1m 58s | 28% faster |
+| File Reads | 0 | 9 | −9 |
+| Grep/Bash | 0 | 5 | −5 |
+| Tool calls | 3 | 13 | 77% fewer |
+| Total tokens | 419k | 1.41M | 70% fewer |
+| Cost | $0.48 | $0.62 | 23% cheaper |
 
 **Tokio** · ~790 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 2m 3s | 2m 53s | 29% faster |
-| File Reads | 3 | 9 | −6 |
-| Grep/Bash | 0 | 7 | −7 |
-| Tool calls | 7 | 17 | 61% fewer |
-| Total tokens | 869k | 2.14M | 59% fewer |
-| Cost | $0.63 | $0.92 | 31% cheaper |
+| Time | 1m 28s | 2m 20s | 37% faster |
+| File Reads | 0 | 8 | −8 |
+| Grep/Bash | 0 | 6 | −6 |
+| Tool calls | 3 | 14 | 79% fewer |
+| Total tokens | 522k | 1.73M | 70% fewer |
+| Cost | $0.53 | $0.82 | 35% cheaper |
 
 **OkHttp** · ~645 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 18s | 1m 27s | 11% faster |
-| File Reads | 2 | 4 | −2 |
-| Grep/Bash | 0 | 4 | −4 |
-| Tool calls | 5 | 8 | 40% fewer |
-| Total tokens | 739k | 883k | 16% fewer |
-| Cost | $0.54 | $0.56 | 4% cheaper |
+| Time | 1m 6s | 1m 29s | 26% faster |
+| File Reads | 1 | 4 | −3 |
+| Grep/Bash | 0 | 6 | −6 |
+| Tool calls | 3 | 10 | 70% fewer |
+| Total tokens | 572k | 1.10M | 48% fewer |
+| Cost | $0.48 | $0.55 | 11% cheaper |
 
 **Gin** · ~110 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 1m 8s | 1m 30s | 25% faster |
-| File Reads | 0 | 3 | −3 |
-| Grep/Bash | 0 | 5 | −5 |
-| Tool calls | 6 | 9 | 35% fewer |
-| Total tokens | 532k | 887k | 40% fewer |
-| Cost | $0.36 | $0.50 | 28% cheaper |
+| Time | 1m 28s | 1m 37s | 9% faster |
+| File Reads | 0 | 6 | −6 |
+| Grep/Bash | 0 | 2 | −2 |
+| Tool calls | 5 | 9 | 47% fewer |
+| Total tokens | 552k | 847k | 35% fewer |
+| Cost | $0.48 | $0.57 | 15% cheaper |
 
 **Alamofire** · ~110 files
 | Metric | WITH cg | WITHOUT cg | Δ |
 |---|---|---|---|
-| Time | 2m 19s | 2m 28s | 6% faster |
-| File Reads | 5 | 9 | −4 |
-| Grep/Bash | 1 | 4 | −3 |
+| Time | 2m 11s | 2m 21s | 7% faster |
+| File Reads | 3 | 9 | −6 |
+| Grep/Bash | 2 | 4 | −2 |
 | Tool calls | 11 | 12 | 13% fewer |
-| Total tokens | 1.22M | 2.14M | 43% fewer |
-| Cost | $0.71 | $1.04 | 32% cheaper |
+| Total tokens | 1.13M | 2.10M | 46% fewer |
+| Cost | $0.69 | $0.95 | 28% cheaper |
 
 </details>
 
 <details>
 <summary><strong>Full benchmark details</strong></summary>
 
-**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
+**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with per-symbol adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
 
 **Queries:**
 | Codebase | Query |
diff --git a/__tests__/adaptive-explore-sizing.test.ts b/__tests__/adaptive-explore-sizing.test.ts
@@ -230,7 +230,10 @@ export class JsonCodec extends Codec {
   encode(input: string): string { return '{' + input + '}'; }
 }
 export class XmlCodec extends Codec {
-  encode(input: string): string { return '<' + input + '>'; }
+  encode(input: string): string {
+    const detail = 'XML_BODY_MARKER';
+    return '<' + input + detail + '>';
+  }
 }
 export class YamlCodec extends Codec {
   encode(input: string): string { return '- ' + input; }
@@ -355,19 +358,34 @@ export class YamlCodec extends Codec {
     expect(bridge).not.toContain('BRIDGE_BODY_MARKER');
   });
 
-  it('skeletonizes a base+subclasses family file even when named (compiler.py: family override beats the named spare)', async () => {
+  it('collapses a base+subclasses family file to a FOCUSED view — base method body kept, non-named subclasses signature-only (compiler.py)', async () => {
     const result = await handler.execute('codegraph_explore', { query: SPARE_QUERY, maxFiles: 15 });
     const text = result.content?.[0]?.text ?? '';
 
     // codec.ts defines the base Codec (>=3 subclasses extend it) and co-locates the
-    // subclasses — a redundant, Read-anyway "family" file (Django's compiler.py). Even
-    // though the agent named `encode`, it STILL skeletonizes: a full one would eat the
-    // explore budget and starve the sibling files. Contrast auth-interceptor.ts above,
-    // which is named AND not a family file → spared. This is the override that keeps
-    // Django from regressing (sparing the family file cost more and Read more).
+    // subclasses — a "family" file (Django's compiler.py). The family-override fires
+    // (it is NOT spared into a full clustered render despite the named `encode`), so
+    // it COLLAPSES — but per-symbol: the named base method `Codec.encode` keeps its
+    // body (so the agent doesn't Read it back — Django's SQLCompiler.execute_sql),
+    // while a non-named subclass (XmlCodec) collapses to a signature. That packs the
+    // mechanism into budget without the redundant subclass bodies.
     const codec = sectionFor(text, 'codec.ts');
     expect(codec, 'codec.ts should be present').not.toBe('');
-    expect(codec, 'a named base+subclasses family file still skeletonizes (budget)').toContain(SKELETON_MARK);
-    expect(codec, 'the elided base body marker must NOT survive').not.toContain('CODEC_BASE_MARKER');
+    expect(codec, 'a named family file collapses to a focused (not full) view').toContain('· focused');
+    expect(codec, 'the named base method body is kept (no Read-back)').toContain('CODEC_BASE_MARKER');
+    expect(codec, 'a non-named subclass body is elided to a signature').not.toContain('XML_BODY_MARKER');
+  });
+
+  it('naming a SHARED/polymorphic method does not spare the siblings (uniqueness-aware)', async () => {
+    // `intercept` is implemented by every interceptor (5 defs) — a polymorphic name,
+    // not a unique one. Naming it must NOT keep all five full (that floods the budget
+    // — Django's `as_sql`×110). The off-spine siblings still collapse, and since none
+    // defines the supertype, `intercept` doesn't even earn a body — pure skeleton.
+    const result = await handler.execute('codegraph_explore', { query: `${QUERY} intercept`, maxFiles: 12 });
+    const text = result.content?.[0]?.text ?? '';
+
+    const bridge = sectionFor(text, 'bridge-interceptor.ts');
+    expect(bridge, 'a sibling named only via a shared method is not spared').toContain(SKELETON_MARK);
+    expect(bridge, 'a shared method does not earn a body in a non-supertype leaf').not.toContain('BRIDGE_BODY_MARKER');
   });
 });
diff --git a/docs/design/adaptive-explore-sizing.md b/docs/design/adaptive-explore-sizing.md
@@ -29,6 +29,33 @@ source.
 > its budget; it is now an *override* of the named-callable spare. The
 > single-condition history below is kept for context.
 
+> **Further refinement (2026-05-29) — per-symbol focused view + named-cluster
+> survival.** Whole-file skeleton/spare was still too coarse on a real Django
+> A/B: the agent Read back `compiler.py` (collapsed → its `execute_sql`/`as_sql`
+> bodies elided) and `query.py` (a non-sibling god-file whose `_fetch_all` cluster
+> got trimmed). Four changes took both repos from ~9–10% to **~14–17% cheaper**
+> with **median 0 reads**:
+> 1. **Uniqueness-aware spare** — only a (near-)UNIQUE named callable spares a
+>    file. `as_sql` has **110 defs** across every Compiler/Expression subclass;
+>    naming it must not keep every backend variant full (it was flooding Django's
+>    budget). `getResponseWithInterceptorChain` (1 def) still spares RealCall.
+> 2. **Per-symbol focused view** — a collapsed family file shows the **full body**
+>    of on-spine / unique-named / canonical-base-supertype methods and only
+>    **signatures** for the rest. So `SQLCompiler.execute_sql`/`as_sql` survive
+>    while the 80 other symbols + redundant subclasses collapse → no Read-back.
+> 3. **Test-file exclusion on all tiers** — a test file (`custom_lookups/tests.py`)
+>    was eating 2.3 KB of Django's 28 KB budget; tests rarely answer an
+>    architecture question. (Previously only the <500-file tiers excluded them.)
+> 4. **Named-cluster survival in non-sibling files** — inject agent-named method
+>    defs into a file's clusters even when the gather missed them, rank them at
+>    importance 9, and cap cluster selection at `min(per-file, remaining-total)`
+>    so high-importance named clusters survive instead of being source-order
+>    trimmed (Django's `_fetch_all`, L2237, the last of four big files emitted).
+> Controls held: OkHttp 14% cheaper / 0 RealCall read-backs; Excalidraw 31%
+> cheaper / 0 reads (god-file clustering unaffected — its big file is emitted
+> first, so the budget cap never binds it). OkHttp's interceptors stay a pure
+> signature skeleton (no named callable in them, don't define a supertype).
+
 ---
 
 ## TL;DR
diff --git a/src/mcp/tools.ts b/src/mcp/tools.ts