TSCG -- Tool-Schema Compression Grammar

Deterministic tool-schema compiler that reduces LLM tool-definition overhead by 50--72% while improving accuracy.

1,200 LOC TypeScript. Zero dependencies. Sub-millisecond. 23KB ESM bundle.

Latest Findings (April 2026)

720-Call E2E Benchmark on Claude Models

Claude Opus 4.7 -- matches-or-beats baseline with 57-63% token savings:

Tool Count	Baseline	TSCG Balanced	Δ Accuracy	Savings
16	70.0%	77.5%	+7.5pp	56.9%
43	77.5%	80.0%	+2.5pp	63.0%
50	72.5%	80.0%	+7.5pp	62.8%

Claude Sonnet 4 -- consistent 57-63% compression with robust accuracy:

Tool Count	Baseline	TSCG Balanced	Δ Accuracy	Savings
16	77.5%	80.0%	+2.5pp	56.9%
43	85.0%	80.0%	-5.0pp	63.0%
50	77.5%	77.5%	±0.0pp	62.8%

480-Call MCP Proxy Benchmark (v1.4.1)

480-call extended proxy benchmark (n=40 per cell, 2 seeds, 2 models x 3 tool counts):

Model	Tools	Baseline	TSCG Proxy	Δ Accuracy	Token Savings
Opus 4.7	16	70.0%	75.0%	+5.0pp	53.1%
Opus 4.7	43	75.0%	75.0%	±0.0pp	55.8%
Opus 4.7	50	77.5%	77.5%	±0.0pp	55.5%
Sonnet 4	16	80.0%	77.5%	-2.5pp	53.1%
Sonnet 4	43	85.0%	82.5%	-2.5pp	55.8%
Sonnet 4	50	77.5%	77.5%	±0.0pp	55.5%

Opus 4.7 matches-or-beats baseline in all conditions; Sonnet 4 within expected CI (max -2.5pp). Both achieve 53-56% token savings.

Tool-Optimizer E2E validation (@tscg/tool-optimizer withTSCG() wrapper, 30 calls, Sonnet 4 @ 16 tools): withTSCG 86.7% vs baseline 80.0% (+6.7pp), 36.6% character savings.

Three Frontier-Model Operator Archetypes

TSCG compression response is model-specific. Three distinct archetypes observed:

Opus 4.7 -- Operator-HUNGRY -- every operator contributes; balanced (all-8) is optimal
Sonnet 4 -- Operator-ROBUST -- config-agnostic; 6 of 7 configs near-identical accuracy
GPT-5.2 -- Operator-SENSITIVE -- CFL helps, CFO hurts; custom config optimal

External Validation -- 4 Independent Benchmarks

TSCG's internal benchmark (TAB -- Tool-Agentic Bench, ~19,000 calls) is independently corroborated by four external benchmarks, including industry-standard evaluation suites:

Benchmark	Type	Result	Significance
BFCL (Berkeley Function Calling Leaderboard)	Industry standard	108--181% ARR across 3 frontier models	Sonnet 4: 85.7%→93.2% (+7.5pp), GPT-4o: 31.7%→57.4% (+25.7pp), GPT-5.2: 61.9%→89.4% (+27.5pp)
ToolBench (Qin et al.)	Academic benchmark	+5.0pp (75.0%→80.0%)	Real-world tool catalog, 20 tools
API-Bank (Li et al.)	Academic benchmark	-5.0pp (80.0%→75.0%)	Honest negative result -- not all benchmarks improve
Real MCP Server (@modelcontextprotocol/server-filesystem)	Production endpoint	100% syntactic validity	30 tasks on live MCP server, server-acceptance 90--97%

TAB → Real MCP Transfer (0.1pp): The internal TAB benchmark is not merely a self-constructed evaluation -- it demonstrably predicts real-world MCP behavior within 0.1 accuracy points. Sonnet 4 on 43-tool MCP: synthetic TAB delta = -1.6pp vs real MCP delta = -1.7pp. This tight transfer validates TAB as a reliable proxy for production MCP deployments.

Mean across the 3 external catalog benchmarks: +2.5pp (80.2%→82.7%).

See paper for full methodology and per-benchmark analysis.

The Problem

Every LLM agent framework sends full JSON Schema definitions for every registered tool on every API call. Claude Code injects ~50,000 tokens of tool definitions per subprocess. At production scale (100K calls/day), the schema overhead alone costs >$30,000/month.

Worse: small models (4B--14B) cannot parse JSON-format tool schemas reliably at scale -- achieving 0--49% accuracy with >15 tools. This locks agentic capabilities behind expensive frontier APIs.

Key Results

Pareto Dominance: Better Accuracy AND Fewer Tokens

BFCL (Berkeley Function Calling Leaderboard) validation -- the industry standard for tool-calling evaluation:

Model	Without TSCG	With TSCG	Improvement	Token Savings
Claude Sonnet 4	85.7%	93.2%	+7.5pp	46.8%
GPT-4o	31.7%	57.4%	+25.7pp (181% ARR)	2.6%
GPT-5.2	61.9%	89.4%	+27.5pp (144% ARR)	8.3%

Every model improves. TSCG achieves 108--181% Accuracy Retention Rate -- it doesn't just retain accuracy, it increases it.

Small Model Enablement

Model	JSON Baseline (20 tools)	With TSCG	Recovery
Phi-4 14B	0%	84.4%	+84.4pp
Mistral 7B	35%	80.1%	+45.1pp
Gemma 3 4B	49.9%	67.0%	+17.1pp

Seven small models (4B--14B) that achieve 0--49% accuracy on JSON tools recover to 65--90% with TSCG. The root cause: JSON format, not model capacity (R^2 = 0.88 against JSON baselines, collapses to 0.03 against text -- 97% of variance is format sensitivity).

Full Benchmark Summary

From ~19,000 API calls across 12 models (4B--32B + 3 frontier APIs), 5 scenarios:

Finding	Detail
Token savings	50--72% on tool schemas
BFCL validation	108--181% Accuracy Retention Rate
Formal guarantee	>=51% savings on any well-formed schema (Theorem 3.1)
Predictive model	R^2 = 0.88 predicts TSCG benefit from single baseline measurement
Speed	50 tools in 2.4ms (Node.js v24, commodity hardware)
Cost at scale	>$30,000/month savings at 100K calls/day

Verified Performance (Fresh Install)

Independent reproduction on @tscg/core from npm:

Metric	Measured
5 realistic tools (Claude target)	59.5% token savings
50 tools	66.6% savings in 2.4ms
Compression time (5 tools)	0.9ms
Unit tests	108 passing (core 47 + proxy 61)
Bundle	34.7KB (11.7KB gzipped)
Dependencies	0

What TSCG Does

TSCG applies 8 formally-defined transforms grounded in how causal transformers process tokens:

Principle	Full Name	What It Does
TAS	Tokenizer-Aligned Syntax	Optimizes for BPE boundaries
CFL	Constraint-First Layout	Exploits the attention sink at position 0
CFO	Causal-Flow Ordering	Orders operations into causal chains
SDM	Semantic Density Maximization	Removes 104+ filler patterns
DRO	Delimiter-Role Optimization	Converts verbose phrases to compact delimiters
CCP	Closure-Context Preservation	Appends closure block for recency bias
CAS	Causal Access Scoring	Scores and reorders by parameter fragility
SAD-F	Selective Anchor Duplication	Budget-constrained anchor duplication

Quick Start

All three @tscg/* packages use umbrella versioning -- same version number, released together.

npm install @tscg/core                # Core compression engine
npm install @tscg/mcp-proxy           # Transparent MCP middleware
npm install @tscg/tool-optimizer      # LangChain / Vercel AI SDK integrations

import { compress } from '@tscg/core';

const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get the current weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string', description: 'City name or coordinates' },
          units: { type: 'string', enum: ['celsius', 'fahrenheit'] },
        },
        required: ['location'],
      },
    },
  },
];

const result = compress(tools, { model: 'claude-sonnet' });
console.log(result.compressed);
console.log(`Saved ${result.metrics.tokens.savingsPercent}% tokens`);
// => "get_weather(location:str units?:str[celsius|fahrenheit])|Get current weather"
// => "Saved 62.3% tokens"

Result Object

const result = compress(tools, { model: 'claude-sonnet', profile: 'balanced' });

result.compressed                        // string — compressed tool definitions
result.metrics.tokens.original           // number — original token count
result.metrics.tokens.compressed         // number — compressed token count
result.metrics.tokens.savingsPercent     // number — e.g. 62.3
result.metrics.compressionTimeMs         // number — e.g. 0.9
result.appliedPrinciples                 // string[] — e.g. ['SDM', 'CAS', 'DRO', 'TAS']
result.metrics.perTool                   // { name, originalTokens, compressedTokens, savingsPercent }[]

Options

compress(tools, {
  model: 'claude-sonnet',   // Target model: 'claude-sonnet' | 'gpt-4o' | 'gpt-4' | ...
  profile: 'balanced',      // Profile: 'conservative' | 'balanced' | 'aggressive' | 'auto'
});

Description-Only Mode (v1.4.0)

Compress only .description fields while preserving the full JSON Schema structure -- compatible with native tool-calling APIs (OpenAI, Anthropic, Google):

import { compressDescriptions } from '@tscg/core';

const result = compressDescriptions(tools, { model: 'claude-sonnet' });
console.log(result.tools);              // Tools with compressed descriptions
console.log(result.metrics.descriptions.savingsPercent); // ~25-40% description savings

Auto Profile (v1.4.0)

The auto profile selects compression principles based on catalog size. At >=30 tools, CFL/CFO are automatically disabled (they become harmful at scale per our 100-tool benchmark findings):

compress(tools, { model: 'claude-sonnet', profile: 'auto' });

Packages

Package	Description	Install
`@tscg/core`	Core compression engine (8 operators)	`npm i @tscg/core`
`@tscg/mcp-proxy`	MCP stdio proxy -- transparent TSCG compression for any MCP server	`npm i @tscg/mcp-proxy`
`@tscg/tool-optimizer`	LangChain, MCP, Vercel AI SDK integrations	`npm i @tscg/tool-optimizer`

CLI

# Compress tool schemas
npx tsx cli/tscg.ts compress --input tools.json --model claude-sonnet --profile balanced

# Run benchmarks
npx tsx cli/tscg.ts benchmark --model claude-sonnet

# Show compression info
npx tsx cli/tscg.ts info

MCP Proxy

@tscg/mcp-proxy sits between Claude Code (or any MCP client) and your MCP tool servers, transparently compressing tool schemas:

# Opus 4.7 -- 57-63% savings, +2.5 to +7.5pp accuracy
npx @tscg/mcp-proxy --target=claude-opus-4-7 --server=<your-mcp-command>

# Sonnet 4 -- 57-63% savings, robust accuracy
npx @tscg/mcp-proxy --target=claude-sonnet-4 --server=<your-mcp-command>

Setting --target automatically enables the full compression pipeline validated by our 720-call benchmark. No other flags required.

Legacy mode (backward compatible with v1.0.x):

npx @tscg/mcp-proxy --server=<your-mcp-command>

Integrations

LangChain:

import { withTSCG } from '@tscg/tool-optimizer/langchain';
const optimizedAgent = withTSCG(agent);

Vercel AI SDK:

import { tscgMiddleware } from '@tscg/tool-optimizer/vercel';

TSCG vs Other Approaches

Property	TSCG	LLMLingua-2	DSPy / SAMMO
Accuracy effect	Improves (108--181% ARR)	Degrades (-5 to -20%)	Degrades
Speed	2.4ms / 50 tools	~42s (GPU)	Minutes
Dependencies	None	GPU + ML framework	API calls
Deterministic	Yes	No	No
Formal guarantees	>=51% savings	None	None
Bundle size	34.7KB	Requires PyTorch	Full stack
Works offline	Yes	GPU required	API required

Who Benefits

Claude Code / Cursor / Windsurf users: ~35K fewer tokens per subprocess
Local LLM users (Ollama): 7B models become functional tool-use agents with 50+ tools
Production API deployments: >$30,000/month savings at 100K calls/day
Multi-agent orchestration: Savings multiply per sub-agent in the chain
Edge / Mobile / Privacy: EU AI Act compliant local deployment becomes viable

Project Structure

packages/
  core/             # @tscg/core — compression engine (8 operators, 47 tests)
  mcp-proxy/        # @tscg/mcp-proxy — stdio proxy for MCP servers (61 tests)
  tool-optimizer/   # @tscg/tool-optimizer — LangChain, Vercel AI SDK integrations
paper/              # LaTeX source (arXiv version)
cli/                # Unified CLI (compress, benchmark, analyze, info)
benchmark/          # TAB benchmark harness, analysis code, raw data
integrations/       # Framework integration examples
docs/               # Technical documentation

Development

git clone https://github.com/SKZL-AI/tscg.git
cd tscg
npm install
npm run build
npm test          # 459 tests
npm run typecheck # Type checking

Paper

TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments

Furkan Sakizli (ORCID: 0009-0009-5975-5014). 2026.

TSCG-paper.pdf -- arXiv preprint (full version, 12 models, ~19,000 API calls, 4-class taxonomy)

LaTeX source is available in paper/.

Citation

@article{sakizli2026tscg,
  title={TSCG: Deterministic Tool-Schema Compilation for Agentic LLM Deployments},
  author={Sakizli, Furkan},
  year={2026},
  note={arXiv preprint},
  orcid={0009-0009-5975-5014}
}

Contributing

See CONTRIBUTING.md for development setup, code style, and PR guidelines.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
benchmark		benchmark
cli		cli
docs		docs
findings		findings
integrations/langchain		integrations/langchain
packages		packages
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
TSCG-paper.pdf		TSCG-paper.pdf
esbuild.browser.mjs		esbuild.browser.mjs
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Folders and files

Latest commit

History

Repository files navigation

TSCG -- Tool-Schema Compression Grammar

Latest Findings (April 2026)

720-Call E2E Benchmark on Claude Models

480-Call MCP Proxy Benchmark (v1.4.1)

Three Frontier-Model Operator Archetypes

External Validation -- 4 Independent Benchmarks

The Problem

Key Results

Pareto Dominance: Better Accuracy AND Fewer Tokens

Small Model Enablement

Full Benchmark Summary

Verified Performance (Fresh Install)

What TSCG Does

Quick Start

Result Object

Options

Description-Only Mode (v1.4.0)

Auto Profile (v1.4.0)

Packages

CLI

MCP Proxy

Integrations

TSCG vs Other Approaches

Who Benefits

Project Structure

Development

Paper

Citation

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages