|
4 | 4 |
|
5 | 5 | ### Supercharge Claude Code, Cursor, Codex, OpenCode, Hermes Agent, Gemini, Antigravity, and Kiro with Semantic Code Intelligence |
6 | 6 |
|
7 | | -**~22% cheaper · ~50% fewer tool calls · 100% local** |
| 7 | +**~25% cheaper · ~62% fewer tool calls · 100% local** |
8 | 8 |
|
9 | 9 | ### [Documentation & Website →](https://colbymchenry.github.io/codegraph/) |
10 | 10 |
|
@@ -83,101 +83,101 @@ When Claude Code explores a codebase, it spawns **Explore agents** that scan fil |
83 | 83 |
|
84 | 84 | ### Benchmark Results |
85 | 85 |
|
86 | | -Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with adaptive `codegraph_explore` sizing._ |
| 86 | +Tested across **7 real-world open-source codebases** spanning 7 languages, comparing an agent (Claude Code, headless) answering one architecture question **with** and **without** CodeGraph. Each cell is the savings at the **median of 4 runs per arm**. _Re-validated on Opus 4.8 (2026-05-29), on the build with per-symbol adaptive `codegraph_explore` sizing._ |
87 | 87 |
|
88 | | -> **Average: 22% cheaper · 47% fewer tokens · 20% faster · 50% fewer tool calls** |
| 88 | +> **Average: 25% cheaper · 57% fewer tokens · 23% faster · 62% fewer tool calls** |
89 | 89 |
|
90 | 90 | | Codebase | Language | Cost | Tokens | Time | Tool calls | |
91 | 91 | |----------|----------|------|--------|------|------------| |
92 | | -| **VS Code** | TypeScript · ~10k files | 13% cheaper | 63% fewer | 11% faster | 82% fewer | |
93 | | -| **Excalidraw** | TypeScript · ~640 | 40% cheaper | 71% fewer | 51% faster | 82% fewer | |
94 | | -| **Django** | Python · ~3k | 9% cheaper | 35% fewer | 7% faster | 38% fewer | |
95 | | -| **Tokio** | Rust · ~790 | 31% cheaper | 59% fewer | 29% faster | 61% fewer | |
96 | | -| **OkHttp** | Java · ~645 | 4% cheaper | 16% fewer | 11% faster | 40% fewer | |
97 | | -| **Gin** | Go · ~110 | 28% cheaper | 40% fewer | 25% faster | 35% fewer | |
98 | | -| **Alamofire** | Swift · ~110 | 32% cheaper | 43% fewer | 6% faster | 13% fewer | |
| 92 | +| **VS Code** | TypeScript · ~10k files | 33% cheaper | 70% fewer | 27% faster | 80% fewer | |
| 93 | +| **Excalidraw** | TypeScript · ~640 | 27% cheaper | 61% fewer | 26% faster | 70% fewer | |
| 94 | +| **Django** | Python · ~3k | 23% cheaper | 70% fewer | 28% faster | 77% fewer | |
| 95 | +| **Tokio** | Rust · ~790 | 35% cheaper | 70% fewer | 37% faster | 79% fewer | |
| 96 | +| **OkHttp** | Java · ~645 | 11% cheaper | 48% fewer | 26% faster | 70% fewer | |
| 97 | +| **Gin** | Go · ~110 | 15% cheaper | 35% fewer | 9% faster | 47% fewer | |
| 98 | +| **Alamofire** | Swift · ~110 | 28% cheaper | 46% fewer | 7% faster | 13% fewer | |
99 | 99 |
|
100 | | -CodeGraph cuts **tool calls and total tokens on every repo** and answers large repos with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. **Every repo is now cheaper, not just faster** — the two former cost outliers (Django and OkHttp, where the answer spans many interchangeable implementations of one interface) flipped from *costlier* than native search to cheaper once adaptive `codegraph_explore` sizing stopped shipping every sibling's full body. The margin is still narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays positive across the board; the largest wins remain fewer tool calls and faster answers. |
| 100 | +CodeGraph cuts **cost, tokens, tool calls, and time on every repo** — across small, medium, and large codebases — and answers most of them with **zero file reads**, while the no-CodeGraph agent spends its budget on grep/find/Read discovery. `codegraph_explore` shows the answer in full — the mechanism plus the exact methods you asked about, even when they're buried in a multi-thousand-line file — while collapsing redundant interchangeable implementations to signatures, so the response is sized to the *answer* rather than the file count. The cost margin is narrowest on the smallest repos, where a modern model's native search is already cheap, but it stays solidly positive across the board. |
101 | 101 |
|
102 | 102 | <details> |
103 | 103 | <summary><strong>Per-repo breakdown — WITH vs WITHOUT (median of 4)</strong></summary> |
104 | 104 |
|
105 | 105 | **VS Code** · ~10k files |
106 | 106 | | Metric | WITH cg | WITHOUT cg | Δ | |
107 | 107 | |---|---|---|---| |
108 | | -| Time | 1m 58s | 2m 13s | 11% faster | |
109 | | -| File Reads | 0 | 8 | −8 | |
110 | | -| Grep/Bash | 0 | 9 | −9 | |
111 | | -| Tool calls | 3 | 17 | 82% fewer | |
112 | | -| Total tokens | 607k | 1.65M | 63% fewer | |
113 | | -| Cost | $0.66 | $0.76 | 13% cheaper | |
| 108 | +| Time | 1m 37s | 2m 13s | 27% faster | |
| 109 | +| File Reads | 0 | 9 | −9 | |
| 110 | +| Grep/Bash | 0 | 11 | −11 | |
| 111 | +| Tool calls | 4 | 21 | 80% fewer | |
| 112 | +| Total tokens | 545k | 1.79M | 70% fewer | |
| 113 | +| Cost | $0.55 | $0.83 | 33% cheaper | |
114 | 114 |
|
115 | 115 | **Excalidraw** · ~640 files |
116 | 116 | | Metric | WITH cg | WITHOUT cg | Δ | |
117 | 117 | |---|---|---|---| |
118 | | -| Time | 1m 23s | 2m 48s | 51% faster | |
119 | | -| File Reads | 0 | 11 | −11 | |
120 | | -| Grep/Bash | 0 | 9 | −9 | |
121 | | -| Tool calls | 4 | 20 | 82% fewer | |
122 | | -| Total tokens | 596k | 2.06M | 71% fewer | |
123 | | -| Cost | $0.53 | $0.89 | 40% cheaper | |
| 118 | +| Time | 1m 34s | 2m 6s | 26% faster | |
| 119 | +| File Reads | 0 | 7 | −7 | |
| 120 | +| Grep/Bash | 0 | 8 | −8 | |
| 121 | +| Tool calls | 5 | 15 | 70% fewer | |
| 122 | +| Total tokens | 651k | 1.69M | 61% fewer | |
| 123 | +| Cost | $0.57 | $0.78 | 27% cheaper | |
124 | 124 |
|
125 | 125 | **Django** · ~3k files |
126 | 126 | | Metric | WITH cg | WITHOUT cg | Δ | |
127 | 127 | |---|---|---|---| |
128 | | -| Time | 1m 43s | 1m 51s | 7% faster | |
129 | | -| File Reads | 5 | 10 | −5 | |
130 | | -| Grep/Bash | 0 | 4 | −4 | |
131 | | -| Tool calls | 8 | 13 | 38% fewer | |
132 | | -| Total tokens | 752k | 1.16M | 35% fewer | |
133 | | -| Cost | $0.56 | $0.62 | 9% cheaper | |
| 128 | +| Time | 1m 25s | 1m 58s | 28% faster | |
| 129 | +| File Reads | 0 | 9 | −9 | |
| 130 | +| Grep/Bash | 0 | 5 | −5 | |
| 131 | +| Tool calls | 3 | 13 | 77% fewer | |
| 132 | +| Total tokens | 419k | 1.41M | 70% fewer | |
| 133 | +| Cost | $0.48 | $0.62 | 23% cheaper | |
134 | 134 |
|
135 | 135 | **Tokio** · ~790 files |
136 | 136 | | Metric | WITH cg | WITHOUT cg | Δ | |
137 | 137 | |---|---|---|---| |
138 | | -| Time | 2m 3s | 2m 53s | 29% faster | |
139 | | -| File Reads | 3 | 9 | −6 | |
140 | | -| Grep/Bash | 0 | 7 | −7 | |
141 | | -| Tool calls | 7 | 17 | 61% fewer | |
142 | | -| Total tokens | 869k | 2.14M | 59% fewer | |
143 | | -| Cost | $0.63 | $0.92 | 31% cheaper | |
| 138 | +| Time | 1m 28s | 2m 20s | 37% faster | |
| 139 | +| File Reads | 0 | 8 | −8 | |
| 140 | +| Grep/Bash | 0 | 6 | −6 | |
| 141 | +| Tool calls | 3 | 14 | 79% fewer | |
| 142 | +| Total tokens | 522k | 1.73M | 70% fewer | |
| 143 | +| Cost | $0.53 | $0.82 | 35% cheaper | |
144 | 144 |
|
145 | 145 | **OkHttp** · ~645 files |
146 | 146 | | Metric | WITH cg | WITHOUT cg | Δ | |
147 | 147 | |---|---|---|---| |
148 | | -| Time | 1m 18s | 1m 27s | 11% faster | |
149 | | -| File Reads | 2 | 4 | −2 | |
150 | | -| Grep/Bash | 0 | 4 | −4 | |
151 | | -| Tool calls | 5 | 8 | 40% fewer | |
152 | | -| Total tokens | 739k | 883k | 16% fewer | |
153 | | -| Cost | $0.54 | $0.56 | 4% cheaper | |
| 148 | +| Time | 1m 6s | 1m 29s | 26% faster | |
| 149 | +| File Reads | 1 | 4 | −3 | |
| 150 | +| Grep/Bash | 0 | 6 | −6 | |
| 151 | +| Tool calls | 3 | 10 | 70% fewer | |
| 152 | +| Total tokens | 572k | 1.10M | 48% fewer | |
| 153 | +| Cost | $0.48 | $0.55 | 11% cheaper | |
154 | 154 |
|
155 | 155 | **Gin** · ~110 files |
156 | 156 | | Metric | WITH cg | WITHOUT cg | Δ | |
157 | 157 | |---|---|---|---| |
158 | | -| Time | 1m 8s | 1m 30s | 25% faster | |
159 | | -| File Reads | 0 | 3 | −3 | |
160 | | -| Grep/Bash | 0 | 5 | −5 | |
161 | | -| Tool calls | 6 | 9 | 35% fewer | |
162 | | -| Total tokens | 532k | 887k | 40% fewer | |
163 | | -| Cost | $0.36 | $0.50 | 28% cheaper | |
| 158 | +| Time | 1m 28s | 1m 37s | 9% faster | |
| 159 | +| File Reads | 0 | 6 | −6 | |
| 160 | +| Grep/Bash | 0 | 2 | −2 | |
| 161 | +| Tool calls | 5 | 9 | 47% fewer | |
| 162 | +| Total tokens | 552k | 847k | 35% fewer | |
| 163 | +| Cost | $0.48 | $0.57 | 15% cheaper | |
164 | 164 |
|
165 | 165 | **Alamofire** · ~110 files |
166 | 166 | | Metric | WITH cg | WITHOUT cg | Δ | |
167 | 167 | |---|---|---|---| |
168 | | -| Time | 2m 19s | 2m 28s | 6% faster | |
169 | | -| File Reads | 5 | 9 | −4 | |
170 | | -| Grep/Bash | 1 | 4 | −3 | |
| 168 | +| Time | 2m 11s | 2m 21s | 7% faster | |
| 169 | +| File Reads | 3 | 9 | −6 | |
| 170 | +| Grep/Bash | 2 | 4 | −2 | |
171 | 171 | | Tool calls | 11 | 12 | 13% fewer | |
172 | | -| Total tokens | 1.22M | 2.14M | 43% fewer | |
173 | | -| Cost | $0.71 | $1.04 | 32% cheaper | |
| 172 | +| Total tokens | 1.13M | 2.10M | 46% fewer | |
| 173 | +| Cost | $0.69 | $0.95 | 28% cheaper | |
174 | 174 |
|
175 | 175 | </details> |
176 | 176 |
|
177 | 177 | <details> |
178 | 178 | <summary><strong>Full benchmark details</strong></summary> |
179 | 179 |
|
180 | | -**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch). |
| 180 | +**Methodology.** Each arm is `claude -p` (Claude Opus 4.8) run headlessly against the repo with `--strict-mcp-config`: **WITH** = CodeGraph's MCP server enabled, **WITHOUT** = an empty MCP config. Built-in Read/Grep/Bash stay available to both. Same question per repo, **4 runs per arm, median reported**. Cost = the run's `total_cost_usd`; Tokens = total tokens processed (input incl. cached + output); Time = wall-clock; Tool calls = every tool invocation, including those inside any sub-agents the model spawns. Repos cloned at `--depth 1` and indexed by the same CodeGraph build that served them. Re-validated 2026-05-29 on the build with per-symbol adaptive `codegraph_explore` sizing. These numbers are lower than the prior Opus 4.7 validation — not a CodeGraph regression but a stronger native baseline: Opus 4.8 greps/reads efficiently on the main thread instead of fanning out into large Explore-subagent sweeps, so the no-CodeGraph arm is leaner than it used to be. Per-repo numbers move run-to-run with how hard the without-arm thrashes (the median-of-4 smooths it, but tails remain — e.g. Django's without-arm hit $2.71/14m one batch). |
181 | 181 |
|
182 | 182 | **Queries:** |
183 | 183 | | Codebase | Query | |
|
0 commit comments