Deterministic tool-schema compiler that reduces LLM tool-definition overhead by 50--72% while improving accuracy.
1,200 LOC TypeScript. Zero dependencies. Sub-millisecond. 23KB ESM bundle.
Claude Opus 4.7 -- matches-or-beats baseline with 57-63% token savings:
| Tool Count | Baseline | TSCG Balanced | Δ Accuracy | Savings |
|---|---|---|---|---|
| 16 | 70.0% | 77.5% | +7.5pp | 56.9% |
| 43 | 77.5% | 80.0% | +2.5pp | 63.0% |
| 50 | 72.5% | 80.0% | +7.5pp | 62.8% |
Claude Sonnet 4 -- consistent 57-63% compression with robust accuracy:
| Tool Count | Baseline | TSCG Balanced | Δ Accuracy | Savings |
|---|---|---|---|---|
| 16 | 77.5% | 80.0% | +2.5pp | 56.9% |
| 43 | 85.0% | 80.0% | -5.0pp | 63.0% |
| 50 | 77.5% | 77.5% | ±0.0pp | 62.8% |
480-call extended proxy benchmark (n=40 per cell, 2 seeds, 2 models x 3 tool counts):
| Model | Tools | Baseline | TSCG Proxy | Δ Accuracy | Token Savings |
|---|---|---|---|---|---|
| Opus 4.7 | 16 | 70.0% | 75.0% | +5.0pp | 53.1% |
| Opus 4.7 | 43 | 75.0% | 75.0% | ±0.0pp | 55.8% |
| Opus 4.7 | 50 | 77.5% | 77.5% | ±0.0pp | 55.5% |
| Sonnet 4 | 16 | 80.0% | 77.5% | -2.5pp | 53.1% |
| Sonnet 4 | 43 | 85.0% | 82.5% | -2.5pp | 55.8% |
| Sonnet 4 | 50 | 77.5% | 77.5% | ±0.0pp | 55.5% |
Opus 4.7 matches-or-beats baseline in all conditions; Sonnet 4 within expected CI (max -2.5pp). Both achieve 53-56% token savings.
Tool-Optimizer E2E validation (@tscg/tool-optimizer withTSCG() wrapper, 30 calls, Sonnet 4 @ 16 tools): withTSCG 86.7% vs baseline 80.0% (+6.7pp), 36.6% character savings.
TSCG compression response is model-specific. Three distinct archetypes observed:
- Opus 4.7 -- Operator-HUNGRY -- every operator contributes; balanced (all-8) is optimal
- Sonnet 4 -- Operator-ROBUST -- config-agnostic; 6 of 7 configs near-identical accuracy
- GPT-5.2 -- Operator-SENSITIVE -- CFL helps, CFO hurts; custom config optimal
TSCG's internal benchmark (TAB -- Tool-Agentic Bench, ~19,000 calls) is independently corroborated by four external benchmarks, including industry-standard evaluation suites:
| Benchmark | Type | Result | Significance |
|---|---|---|---|
| BFCL (Berkeley Function Calling Leaderboard) | Industry standard | 108--181% ARR across 3 frontier models | Sonnet 4: 85.7%→93.2% (+7.5pp), GPT-4o: 31.7%→57.4% (+25.7pp), GPT-5.2: 61.9%→89.4% (+27.5pp) |
| ToolBench (Qin et al.) | Academic benchmark | +5.0pp (75.0%→80.0%) | Real-world tool catalog, 20 tools |
| API-Bank (Li et al.) | Academic benchmark | -5.0pp (80.0%→75.0%) | Honest negative result -- not all benchmarks improve |
| Real MCP Server (@modelcontextprotocol/server-filesystem) | Production endpoint | 100% syntactic validity | 30 tasks on live MCP server, server-acceptance 90--97% |
TAB → Real MCP Transfer (0.1pp): The internal TAB benchmark is not merely a self-constructed evaluation -- it demonstrably predicts real-world MCP behavior within 0.1 accuracy points. Sonnet 4 on 43-tool MCP: synthetic TAB delta = -1.6pp vs real MCP delta = -1.7pp. This tight transfer validates TAB as a reliable proxy for production MCP deployments.
Mean across the 3 external catalog benchmarks: +2.5pp (80.2%→82.7%).
See paper for full methodology and per-benchmark analysis.
Every LLM agent framework sends full JSON Schema definitions for every registered tool on every API call. Claude Code injects ~50,000 tokens of tool definitions per subprocess. At production scale (100K calls/day), the schema overhead alone costs >$30,000/month.
Worse: small models (4B--14B) cannot parse JSON-format tool schemas reliably at scale -- achieving 0--49% accuracy with >15 tools. This locks agentic capabilities behind expensive frontier APIs.
BFCL (Berkeley Function Calling Leaderboard) validation -- the industry standard for tool-calling evaluation:
| Model | Without TSCG | With TSCG | Improvement | Token Savings |
|---|---|---|---|---|
| Claude Sonnet 4 | 85.7% | 93.2% | +7.5pp | 46.8% |
| GPT-4o | 31.7% | 57.4% | +25.7pp (181% ARR) | 2.6% |
| GPT-5.2 | 61.9% | 89.4% | +27.5pp (144% ARR) | 8.3% |
Every model improves. TSCG achieves 108--181% Accuracy Retention Rate -- it doesn't just retain accuracy, it increases it.
| Model | JSON Baseline (20 tools) | With TSCG | Recovery |
|---|---|---|---|
| Phi-4 14B | 0% | 84.4% | +84.4pp |
| Mistral 7B | 35% | 80.1% | +45.1pp |
| Gemma 3 4B | 49.9% | 67.0% | +17.1pp |
Seven small models (4B--14B) that achieve 0--49% accuracy on JSON tools recover to 65--90% with TSCG. The root cause: JSON format, not model capacity (R^2 = 0.88 against JSON baselines, collapses to 0.03 against text -- 97% of variance is format sensitivity).
From ~19,000 API calls across 12 models (4B--32B + 3 frontier APIs), 5 scenarios:
| Finding | Detail |
|---|---|
| Token savings | 50--72% on tool schemas |
| BFCL validation | 108--181% Accuracy Retention Rate |
| Formal guarantee | >=51% savings on any well-formed schema (Theorem 3.1) |
| Predictive model | R^2 = 0.88 predicts TSCG benefit from single baseline measurement |
| Speed | 50 tools in 2.4ms (Node.js v24, commodity hardware) |
| Cost at scale | >$30,000/month savings at 100K calls/day |
Independent reproduction on @tscg/core from npm:
| Metric | Measured |
|---|---|
| 5 realistic tools (Claude target) | 59.5% token savings |
| 50 tools | 66.6% savings in 2.4ms |
| Compression time (5 tools) | 0.9ms |
| Unit tests | 108 passing (core 47 + proxy 61) |
| Bundle | 34.7KB (11.7KB gzipped) |
| Dependencies | 0 |
TSCG applies 8 formally-defined transforms grounded in how causal transformers process tokens:
| Principle | Full Name | What It Does |
|---|---|---|
| TAS | Tokenizer-Aligned Syntax | Optimizes for BPE boundaries |
| CFL | Constraint-First Layout | Exploits the attention sink at position 0 |
| CFO | Causal-Flow Ordering | Orders operations into causal chains |
| SDM | Semantic Density Maximization | Removes 104+ filler patterns |
| DRO | Delimiter-Role Optimization | Converts verbose phrases to compact delimiters |
| CCP | Closure-Context Preservation | Appends closure block for recency bias |
| CAS | Causal Access Scoring | Scores and reorders by parameter fragility |
| SAD-F | Selective Anchor Duplication | Budget-constrained anchor duplication |
All three @tscg/* packages use umbrella versioning -- same version number, released together.
npm install @tscg/core # Core compression engine
npm install @tscg/mcp-proxy # Transparent MCP middleware
npm install @tscg/tool-optimizer # LangChain / Vercel AI SDK integrationsimport { compress } from '@tscg/core';
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get the current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string', description: 'City name or coordinates' },
units: { type: 'string', enum: ['celsius', 'fahrenheit'] },
},
required: ['location'],
},
},
},
];
const result = compress(tools, { model: 'claude-sonnet' });
console.log(result.compressed);
console.log(`Saved ${result.metrics.tokens.savingsPercent}% tokens`);
// => "get_weather(location:str units?:str[celsius|fahrenheit])|Get current weather"
// => "Saved 62.3% tokens"const result = compress(tools, { model: 'claude-sonnet', profile: 'balanced' });
result.compressed // string — compressed tool definitions
result.metrics.tokens.original // number — original token count
result.metrics.tokens.compressed // number — compressed token count
result.metrics.tokens.savingsPercent // number — e.g. 62.3
result.metrics.compressionTimeMs // number — e.g. 0.9
result.appliedPrinciples // string[] — e.g. ['SDM', 'CAS', 'DRO', 'TAS']
result.metrics.perTool // { name, originalTokens, compressedTokens, savingsPercent }[]compress(tools, {
model: 'claude-sonnet', // Target model: 'claude-sonnet' | 'gpt-4o' | 'gpt-4' | ...
profile: 'balanced', // Profile: 'conservative' | 'balanced' | 'aggressive' | 'auto'
});Compress only .description fields while preserving the full JSON Schema structure -- compatible with native tool-calling APIs (OpenAI, Anthropic, Google):
import { compressDescriptions } from '@tscg/core';
const result = compressDescriptions(tools, { model: 'claude-sonnet' });
console.log(result.tools); // Tools with compressed descriptions
console.log(result.metrics.descriptions.savingsPercent); // ~25-40% description savingsThe auto profile selects compression principles based on catalog size. At >=30 tools, CFL/CFO are automatically disabled (they become harmful at scale per our 100-tool benchmark findings):
compress(tools, { model: 'claude-sonnet', profile: 'auto' });| Package | Description | Install |
|---|---|---|
@tscg/core |
Core compression engine (8 operators) | npm i @tscg/core |
@tscg/mcp-proxy |
MCP stdio proxy -- transparent TSCG compression for any MCP server | npm i @tscg/mcp-proxy |
@tscg/tool-optimizer |
LangChain, MCP, Vercel AI SDK integrations | npm i @tscg/tool-optimizer |
# Compress tool schemas
npx tsx cli/tscg.ts compress --input tools.json --model claude-sonnet --profile balanced
# Run benchmarks
npx tsx cli/tscg.ts benchmark --model claude-sonnet
# Show compression info
npx tsx cli/tscg.ts info@tscg/mcp-proxy sits between Claude Code (or any MCP client) and your MCP tool servers, transparently compressing tool schemas:
# Opus 4.7 -- 57-63% savings, +2.5 to +7.5pp accuracy
npx @tscg/mcp-proxy --target=claude-opus-4-7 --server=<your-mcp-command>
# Sonnet 4 -- 57-63% savings, robust accuracy
npx @tscg/mcp-proxy --target=claude-sonnet-4 --server=<your-mcp-command>Setting --target automatically enables the full compression pipeline validated by our 720-call benchmark. No other flags required.
Legacy mode (backward compatible with v1.0.x):
npx @tscg/mcp-proxy --server=<your-mcp-command>LangChain:
import { withTSCG } from '@tscg/tool-optimizer/langchain';
const optimizedAgent = withTSCG(agent);Vercel AI SDK:
import { tscgMiddleware } from '@tscg/tool-optimizer/vercel';| Property | TSCG | LLMLingua-2 | DSPy / SAMMO |
|---|---|---|---|
| Accuracy effect | Improves (108--181% ARR) | Degrades (-5 to -20%) | Degrades |
| Speed | 2.4ms / 50 tools | ~42s (GPU) | Minutes |
| Dependencies | None | GPU + ML framework | API calls |
| Deterministic | Yes | No | No |
| Formal guarantees | >=51% savings | None | None |
| Bundle size | 34.7KB | Requires PyTorch | Full stack |
| Works offline | Yes | GPU required | API required |
- Claude Code / Cursor / Windsurf users: ~35K fewer tokens per subprocess
- Local LLM users (Ollama): 7B models become functional tool-use agents with 50+ tools
- Production API deployments: >$30,000/month savings at 100K calls/day
- Multi-agent orchestration: Savings multiply per sub-agent in the chain
- Edge / Mobile / Privacy: EU AI Act compliant local deployment becomes viable
packages/
core/ # @tscg/core — compression engine (8 operators, 47 tests)
mcp-proxy/ # @tscg/mcp-proxy — stdio proxy for MCP servers (61 tests)
tool-optimizer/ # @tscg/tool-optimizer — LangChain, Vercel AI SDK integrations
paper/ # LaTeX source (arXiv version)
cli/ # Unified CLI (compress, benchmark, analyze, info)
benchmark/ # TAB benchmark harness, analysis code, raw data
integrations/ # Framework integration examples
docs/ # Technical documentation
git clone https://github.com/SKZL-AI/tscg.git
cd tscg
npm install
npm run build
npm test # 459 tests
npm run typecheck # Type checkingTSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments
Furkan Sakizli (ORCID: 0009-0009-5975-5014). 2026.
TSCG-paper.pdf -- arXiv preprint (full version, 12 models, ~19,000 API calls, 4-class taxonomy)
LaTeX source is available in paper/.
@article{sakizli2026tscg,
title={TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments},
author={Sakizli, Furkan},
year={2026},
note={arXiv preprint},
orcid={0009-0009-5975-5014}
}See CONTRIBUTING.md for development setup, code style, and PR guidelines.