Skip to content

Commit b026e64

Browse files
colbymchenryclaude
andauthored
feat(mcp): per-symbol adaptive codegraph_explore sizing (#569)
Sizes codegraph_explore to the answer, not the file count: shows the mechanism + the exact methods you named in full (even buried in a large file) while collapsing redundant interchangeable implementations to signatures. Adds uniqueness-aware spare, per-symbol focused rendering of family files, all-tier test-file exclusion, and named-method cluster survival in non-sibling god-files. Validated A/B (Opus 4.8, 7-repo sweep): avg 25%% cheaper / 57%% fewer tokens / 23%% faster / 62%% fewer tool calls. Django 9->23%% cheaper (0 reads), OkHttp 4->11%% cheaper; gains across small/medium/large, inert repos unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1 parent f1b14f0 commit b026e64

5 files changed

Lines changed: 223 additions & 108 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ and adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
1313

1414
- `codegraph init` now builds the initial index by default — you no longer need the `-i`/`--index` flag (it's still accepted, so existing commands and scripts keep working). (#483)
1515
- Go: Gin middleware chains now connect end-to-end in `codegraph_trace` and `codegraph_explore` — following a request reaches the middleware and route handlers registered via `.Use()` / `.GET()` instead of dead-ending where the framework dispatches the chain dynamically.
16-
- `codegraph_explore` is now leaner on interface-heavy flows: when a query spans many interchangeable implementations of one interface (an HTTP interceptor chain, say), it shows the rest as signatures instead of every full body, while keeping the dispatch mechanism and any specific method you asked about in full. Fewer tokens for the same answer, so questions like these stop costing more than plain grep/read — in testing, the two slowest-to-pay-off repos (a Java and a Python framework) went from slightly costlier than native search to clearly cheaper. Distinct, non-interchangeable code is shown in full as before. Disable with `CODEGRAPH_ADAPTIVE_EXPLORE=0`.
16+
- `codegraph_explore` now sizes its response to the *answer* instead of the file count: it shows the mechanism and the exact methods you asked about in full — even when they're buried deep in a large file — while collapsing the redundant interchangeable implementations of an interface (an HTTP interceptor chain, a query-compiler family) down to signatures. Fewer tokens for a more complete answer, so on the flows that used to occasionally cost more than plain grep/read it's now clearly cheaper — and the win holds across small, medium, and large codebases. Distinct, non-interchangeable code is shown in full as before. Disable with `CODEGRAPH_ADAPTIVE_EXPLORE=0`.
1717

1818
### Fixes
1919

README.md

Lines changed: 53 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
### Supercharge Claude Code, Cursor, Codex, OpenCode, Hermes Agent, Gemini, Antigravity, and Kiro with Semantic Code Intelligence
66

7-
**~22% cheaper · ~50% fewer tool calls · 100% local**
7+
**~25% cheaper · ~62% fewer tool calls · 100% local**
88

99
### [Documentation & Website →](https://colbymchenry.github.io/codegraph/)
1010

@@ -83,101 +83,101 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil
8383

8484
### Benchmark Results
8585

86-
Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with adaptive `codegraph_explore` sizing._
86+
Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with per-symbol adaptive `codegraph_explore` sizing._
8787

88-
> **Average: 22% cheaper · 47% fewer tokens · 20% faster · 50% fewer tool calls**
88+
> **Average: 25% cheaper · 57% fewer tokens · 23% faster · 62% fewer tool calls**
8989
9090
| Codebase | Language | Cost | Tokens | Time | Tool calls |
9191
|----------|----------|------|--------|------|------------|
92-
| **VS Code** | TypeScript · ~10k files | 13% cheaper | 63% fewer | 11% faster | 82% fewer |
93-
| **Excalidraw** | TypeScript · ~640 | 40% cheaper | 71% fewer | 51% faster | 82% fewer |
94-
| **Django** | Python · ~3k | 9% cheaper | 35% fewer | 7% faster | 38% fewer |
95-
| **Tokio** | Rust · ~790 | 31% cheaper | 59% fewer | 29% faster | 61% fewer |
96-
| **OkHttp** | Java · ~645 | 4% cheaper | 16% fewer | 11% faster | 40% fewer |
97-
| **Gin** | Go · ~110 | 28% cheaper | 40% fewer | 25% faster | 35% fewer |
98-
| **Alamofire** | Swift · ~110 | 32% cheaper | 43% fewer | 6% faster | 13% fewer |
92+
| **VS Code** | TypeScript · ~10k files | 33% cheaper | 70% fewer | 27% faster | 80% fewer |
93+
| **Excalidraw** | TypeScript · ~640 | 27% cheaper | 61% fewer | 26% faster | 70% fewer |
94+
| **Django** | Python · ~3k | 23% cheaper | 70% fewer | 28% faster | 77% fewer |
95+
| **Tokio** | Rust · ~790 | 35% cheaper | 70% fewer | 37% faster | 79% fewer |
96+
| **OkHttp** | Java · ~645 | 11% cheaper | 48% fewer | 26% faster | 70% fewer |
97+
| **Gin** | Go · ~110 | 15% cheaper | 35% fewer | 9% faster | 47% fewer |
98+
| **Alamofire** | Swift · ~110 | 28% cheaper | 46% fewer | 7% faster | 13% fewer |
9999

100-
CodeGraph cuts **tool calls and total tokens on every repo** and answers large repos with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. **Every repo is now cheaper, not just faster** — the two former cost outliers (Django and OkHttp, where the answer spans many interchangeable implementations of one interface) flipped from *costlier* than native search to cheaper once adaptive `codegraph_explore` sizing stopped shipping every sibling's full body. The margin is still narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays positive across the board; the largest wins remain fewer tool calls and faster answers.
100+
CodeGraph cuts **cost, tokens, tool calls, and time on every repo** — across small, medium, and large codebases — and answers most of them with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. `codegraph_explore` shows the answer in full — the mechanism plus the exact methods you asked about, even when they're buried in a multi-thousand-line file — while collapsing redundant interchangeable implementations to signatures, so the response is sized to the *answer* rather than the file count. The cost margin is narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays solidly positive across the board.
101101

102102
<details>
103103
<summary><strong>Per-repo breakdown — WITH vs WITHOUT (median of 4)</strong></summary>
104104

105105
**VS Code** · ~10k files
106106
| Metric | WITH cg | WITHOUT cg | Δ |
107107
|---|---|---|---|
108-
| Time | 1m 58s | 2m 13s | 11% faster |
109-
| File Reads | 0 | 8 |8 |
110-
| Grep/Bash | 0 | 9 |9 |
111-
| Tool calls | 3 | 17 | 82% fewer |
112-
| Total tokens | 607k | 1.65M | 63% fewer |
113-
| Cost | $0.66 | $0.76 | 13% cheaper |
108+
| Time | 1m 37s | 2m 13s | 27% faster |
109+
| File Reads | 0 | 9 |9 |
110+
| Grep/Bash | 0 | 11 |11 |
111+
| Tool calls | 4 | 21 | 80% fewer |
112+
| Total tokens | 545k | 1.79M | 70% fewer |
113+
| Cost | $0.55 | $0.83 | 33% cheaper |
114114

115115
**Excalidraw** · ~640 files
116116
| Metric | WITH cg | WITHOUT cg | Δ |
117117
|---|---|---|---|
118-
| Time | 1m 23s | 2m 48s | 51% faster |
119-
| File Reads | 0 | 11 |11 |
120-
| Grep/Bash | 0 | 9 |9 |
121-
| Tool calls | 4 | 20 | 82% fewer |
122-
| Total tokens | 596k | 2.06M | 71% fewer |
123-
| Cost | $0.53 | $0.89 | 40% cheaper |
118+
| Time | 1m 34s | 2m 6s | 26% faster |
119+
| File Reads | 0 | 7 |7 |
120+
| Grep/Bash | 0 | 8 |8 |
121+
| Tool calls | 5 | 15 | 70% fewer |
122+
| Total tokens | 651k | 1.69M | 61% fewer |
123+
| Cost | $0.57 | $0.78 | 27% cheaper |
124124

125125
**Django** · ~3k files
126126
| Metric | WITH cg | WITHOUT cg | Δ |
127127
|---|---|---|---|
128-
| Time | 1m 43s | 1m 51s | 7% faster |
129-
| File Reads | 5 | 10 |5 |
130-
| Grep/Bash | 0 | 4 |4 |
131-
| Tool calls | 8 | 13 | 38% fewer |
132-
| Total tokens | 752k | 1.16M | 35% fewer |
133-
| Cost | $0.56 | $0.62 | 9% cheaper |
128+
| Time | 1m 25s | 1m 58s | 28% faster |
129+
| File Reads | 0 | 9 |9 |
130+
| Grep/Bash | 0 | 5 |5 |
131+
| Tool calls | 3 | 13 | 77% fewer |
132+
| Total tokens | 419k | 1.41M | 70% fewer |
133+
| Cost | $0.48 | $0.62 | 23% cheaper |
134134

135135
**Tokio** · ~790 files
136136
| Metric | WITH cg | WITHOUT cg | Δ |
137137
|---|---|---|---|
138-
| Time | 2m 3s | 2m 53s | 29% faster |
139-
| File Reads | 3 | 9 |6 |
140-
| Grep/Bash | 0 | 7 |7 |
141-
| Tool calls | 7 | 17 | 61% fewer |
142-
| Total tokens | 869k | 2.14M | 59% fewer |
143-
| Cost | $0.63 | $0.92 | 31% cheaper |
138+
| Time | 1m 28s | 2m 20s | 37% faster |
139+
| File Reads | 0 | 8 |8 |
140+
| Grep/Bash | 0 | 6 |6 |
141+
| Tool calls | 3 | 14 | 79% fewer |
142+
| Total tokens | 522k | 1.73M | 70% fewer |
143+
| Cost | $0.53 | $0.82 | 35% cheaper |
144144

145145
**OkHttp** · ~645 files
146146
| Metric | WITH cg | WITHOUT cg | Δ |
147147
|---|---|---|---|
148-
| Time | 1m 18s | 1m 27s | 11% faster |
149-
| File Reads | 2 | 4 |2 |
150-
| Grep/Bash | 0 | 4 |4 |
151-
| Tool calls | 5 | 8 | 40% fewer |
152-
| Total tokens | 739k | 883k | 16% fewer |
153-
| Cost | $0.54 | $0.56 | 4% cheaper |
148+
| Time | 1m 6s | 1m 29s | 26% faster |
149+
| File Reads | 1 | 4 |3 |
150+
| Grep/Bash | 0 | 6 |6 |
151+
| Tool calls | 3 | 10 | 70% fewer |
152+
| Total tokens | 572k | 1.10M | 48% fewer |
153+
| Cost | $0.48 | $0.55 | 11% cheaper |
154154

155155
**Gin** · ~110 files
156156
| Metric | WITH cg | WITHOUT cg | Δ |
157157
|---|---|---|---|
158-
| Time | 1m 8s | 1m 30s | 25% faster |
159-
| File Reads | 0 | 3 |3 |
160-
| Grep/Bash | 0 | 5 |5 |
161-
| Tool calls | 6 | 9 | 35% fewer |
162-
| Total tokens | 532k | 887k | 40% fewer |
163-
| Cost | $0.36 | $0.50 | 28% cheaper |
158+
| Time | 1m 28s | 1m 37s | 9% faster |
159+
| File Reads | 0 | 6 |6 |
160+
| Grep/Bash | 0 | 2 |2 |
161+
| Tool calls | 5 | 9 | 47% fewer |
162+
| Total tokens | 552k | 847k | 35% fewer |
163+
| Cost | $0.48 | $0.57 | 15% cheaper |
164164

165165
**Alamofire** · ~110 files
166166
| Metric | WITH cg | WITHOUT cg | Δ |
167167
|---|---|---|---|
168-
| Time | 2m 19s | 2m 28s | 6% faster |
169-
| File Reads | 5 | 9 |4 |
170-
| Grep/Bash | 1 | 4 |3 |
168+
| Time | 2m 11s | 2m 21s | 7% faster |
169+
| File Reads | 3 | 9 |6 |
170+
| Grep/Bash | 2 | 4 |2 |
171171
| Tool calls | 11 | 12 | 13% fewer |
172-
| Total tokens | 1.22M | 2.14M | 43% fewer |
173-
| Cost | $0.71 | $1.04 | 32% cheaper |
172+
| Total tokens | 1.13M | 2.10M | 46% fewer |
173+
| Cost | $0.69 | $0.95 | 28% cheaper |
174174

175175
</details>
176176

177177
<details>
178178
<summary><strong>Full benchmark details</strong></summary>
179179

180-
**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
180+
**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with per-symbol adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch).
181181

182182
**Queries:**
183183
| Codebase | Query |

__tests__/adaptive-explore-sizing.test.ts

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,10 @@ export class JsonCodec extends Codec {
230230
encode(input: string): string { return '{' + input + '}'; }
231231
}
232232
export class XmlCodec extends Codec {
233-
encode(input: string): string { return '<' + input + '>'; }
233+
encode(input: string): string {
234+
const detail = 'XML_BODY_MARKER';
235+
return '<' + input + detail + '>';
236+
}
234237
}
235238
export class YamlCodec extends Codec {
236239
encode(input: string): string { return '- ' + input; }
@@ -355,19 +358,34 @@ export class YamlCodec extends Codec {
355358
expect(bridge).not.toContain('BRIDGE_BODY_MARKER');
356359
});
357360

358-
it('skeletonizes a base+subclasses family file even when named (compiler.py: family override beats the named spare)', async () => {
361+
it('collapses a base+subclasses family file to a FOCUSED view — base method body kept, non-named subclasses signature-only (compiler.py)', async () => {
359362
const result = await handler.execute('codegraph_explore', { query: SPARE_QUERY, maxFiles: 15 });
360363
const text = result.content?.[0]?.text ?? '';
361364

362365
// codec.ts defines the base Codec (>=3 subclasses extend it) and co-locates the
363-
// subclasses — a redundant, Read-anyway "family" file (Django's compiler.py). Even
364-
// though the agent named `encode`, it STILL skeletonizes: a full one would eat the
365-
// explore budget and starve the sibling files. Contrast auth-interceptor.ts above,
366-
// which is named AND not a family file → spared. This is the override that keeps
367-
// Django from regressing (sparing the family file cost more and Read more).
366+
// subclasses — a "family" file (Django's compiler.py). The family-override fires
367+
// (it is NOT spared into a full clustered render despite the named `encode`), so
368+
// it COLLAPSES — but per-symbol: the named base method `Codec.encode` keeps its
369+
// body (so the agent doesn't Read it back — Django's SQLCompiler.execute_sql),
370+
// while a non-named subclass (XmlCodec) collapses to a signature. That packs the
371+
// mechanism into budget without the redundant subclass bodies.
368372
const codec = sectionFor(text, 'codec.ts');
369373
expect(codec, 'codec.ts should be present').not.toBe('');
370-
expect(codec, 'a named base+subclasses family file still skeletonizes (budget)').toContain(SKELETON_MARK);
371-
expect(codec, 'the elided base body marker must NOT survive').not.toContain('CODEC_BASE_MARKER');
374+
expect(codec, 'a named family file collapses to a focused (not full) view').toContain('· focused');
375+
expect(codec, 'the named base method body is kept (no Read-back)').toContain('CODEC_BASE_MARKER');
376+
expect(codec, 'a non-named subclass body is elided to a signature').not.toContain('XML_BODY_MARKER');
377+
});
378+
379+
it('naming a SHARED/polymorphic method does not spare the siblings (uniqueness-aware)', async () => {
380+
// `intercept` is implemented by every interceptor (5 defs) — a polymorphic name,
381+
// not a unique one. Naming it must NOT keep all five full (that floods the budget
382+
// — Django's `as_sql`×110). The off-spine siblings still collapse, and since none
383+
// defines the supertype, `intercept` doesn't even earn a body — pure skeleton.
384+
const result = await handler.execute('codegraph_explore', { query: `${QUERY} intercept`, maxFiles: 12 });
385+
const text = result.content?.[0]?.text ?? '';
386+
387+
const bridge = sectionFor(text, 'bridge-interceptor.ts');
388+
expect(bridge, 'a sibling named only via a shared method is not spared').toContain(SKELETON_MARK);
389+
expect(bridge, 'a shared method does not earn a body in a non-supertype leaf').not.toContain('BRIDGE_BODY_MARKER');
372390
});
373391
});

docs/design/adaptive-explore-sizing.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,33 @@ source.
2929
> its budget; it is now an *override* of the named-callable spare. The
3030
> single-condition history below is kept for context.
3131
32+
> **Further refinement (2026-05-29) — per-symbol focused view + named-cluster
33+
> survival.** Whole-file skeleton/spare was still too coarse on a real Django
34+
> A/B: the agent Read back `compiler.py` (collapsed → its `execute_sql`/`as_sql`
35+
> bodies elided) and `query.py` (a non-sibling god-file whose `_fetch_all` cluster
36+
> got trimmed). Four changes took both repos from ~9–10% to **~14–17% cheaper**
37+
> with **median 0 reads**:
38+
> 1. **Uniqueness-aware spare** — only a (near-)UNIQUE named callable spares a
39+
> file. `as_sql` has **110 defs** across every Compiler/Expression subclass;
40+
> naming it must not keep every backend variant full (it was flooding Django's
41+
> budget). `getResponseWithInterceptorChain` (1 def) still spares RealCall.
42+
> 2. **Per-symbol focused view** — a collapsed family file shows the **full body**
43+
> of on-spine / unique-named / canonical-base-supertype methods and only
44+
> **signatures** for the rest. So `SQLCompiler.execute_sql`/`as_sql` survive
45+
> while the 80 other symbols + redundant subclasses collapse → no Read-back.
46+
> 3. **Test-file exclusion on all tiers** — a test file (`custom_lookups/tests.py`)
47+
> was eating 2.3 KB of Django's 28 KB budget; tests rarely answer an
48+
> architecture question. (Previously only the <500-file tiers excluded them.)
49+
> 4. **Named-cluster survival in non-sibling files** — inject agent-named method
50+
> defs into a file's clusters even when the gather missed them, rank them at
51+
> importance 9, and cap cluster selection at `min(per-file, remaining-total)`
52+
> so high-importance named clusters survive instead of being source-order
53+
> trimmed (Django's `_fetch_all`, L2237, the last of four big files emitted).
54+
> Controls held: OkHttp 14% cheaper / 0 RealCall read-backs; Excalidraw 31%
55+
> cheaper / 0 reads (god-file clustering unaffected — its big file is emitted
56+
> first, so the budget cap never binds it). OkHttp's interceptors stay a pure
57+
> signature skeleton (no named callable in them, don't define a supertype).
58+
3259
---
3360

3461
## TL;DR

0 commit comments

Comments
 (0)