NadirRouter · doramirdor · May 29, 2026 · Jun 8, 2026
diff --git a/README.md b/README.md
@@ -1035,7 +1035,7 @@ Options:
   --complex-model TEXT    Model for complex prompts
   --models TEXT           Comma-separated model list (legacy)
   --token TEXT            Auth token
-  --optimize [off|safe|aggressive]  Context optimization mode (default: off)
+  --optimize [off|safe|aggressive|progressive]  Context compression: off | safe | aggressive | progressive (default: off)
   --verbose               Enable debug logging
   --log-raw               Log full raw requests and responses to JSONL
 ```
@@ -1420,8 +1420,14 @@ Auth is disabled by default (local-only). Set `NADIRCLAW_AUTH_TOKEN` to require
 | `NADIRCLAW_CONFIDENCE_THRESHOLD` | `0.06` | Classification threshold (lower = more complex) |
 | `NADIRCLAW_PORT` | `8856` | Server port |
 | `NADIRCLAW_LOG_DIR` | `~/.nadirclaw/logs` | Log directory |
-| `NADIRCLAW_OPTIMIZE` | `off` | Context optimization mode: `off`, `safe` (lossless), `aggressive` (future) |
+| `NADIRCLAW_OPTIMIZE` | `off` | Context compression: `off` (disabled), `safe` (lossless), `aggressive`, or `progressive` (staged ladder that escalates to Headroom). `off` is the master on/off switch |
 | `NADIRCLAW_OPTIMIZE_MAX_TURNS` | `40` | Max conversation turns to keep when trimming history |
+| `NADIRCLAW_OPTIMIZE_BACKEND` | `native` | Optimizer backend: `native` (built-in) or `headroom` (needs `pip install nadirclaw[headroom]`; falls back to native if absent). See [savings analysis](docs/context-optimize-savings.md#backends-native-default-vs-headroom) |
+| `NADIRCLAW_HEADROOM_KOMPRESS` | `off` | When backend is `headroom`, enable Kompress ML text compression (downloads a HuggingFace model on first use) |
+| `NADIRCLAW_OPTIMIZE_PROGRESSIVE` | `off` | Legacy alias for `NADIRCLAW_OPTIMIZE=progressive` — forces the [progressive ladder](docs/context-optimize-savings.md#progressive-staged-compression) regardless of mode. Prefer setting `NADIRCLAW_OPTIMIZE=progressive` |
+| `NADIRCLAW_OPTIMIZE_TARGET_TOKENS` | _(unset)_ | Token budget for progressive compression (e.g. the model's context window). Unset → native stages only |
+| `NADIRCLAW_OPTIMIZE_MAX_STAGE` | `headroom_structural` | Cap on the progressive ladder: `native_safe`, `native_aggressive`, `headroom_structural`, or `headroom_ml` |
+| `NADIRCLAW_OPTIMIZE_ALLOW_LOSSY` | `off` | Permit the lossy ML prose stage (`headroom_ml`) in progressive compression |
 | `NADIRCLAW_LOG_RAW` | `false` | Log full raw requests and responses (`true`/`false`) |
 | `NADIRCLAW_MODELS` | `openai-codex/gpt-5.3-codex,gemini-3-flash-preview` | Legacy model list (fallback if tier vars not set) |
 | `OTEL_EXPORTER_OTLP_ENDPOINT` | *(empty — disabled)* | OpenTelemetry collector endpoint (enables tracing) |

diff --git a/THIRD_PARTY_NOTICES.md b/THIRD_PARTY_NOTICES.md
@@ -0,0 +1,19 @@
+# Third-Party Notices
+
+NadirClaw is MIT-licensed. It can optionally use the following third-party
+components, declared as opt-in extras. Their licenses and attributions are
+reproduced here.
+
+## headroom-ai
+
+- **Used by:** the optional `headroom` optimizer backend
+  (`NADIRCLAW_OPTIMIZE_BACKEND=headroom`), installed via `pip install nadirclaw[headroom]`.
+- **Project:** Headroom — https://github.com/chopratejas/headroom
+- **License:** Apache License 2.0
+- **NOTICE:** Headroom, Copyright 2025 Headroom Contributors.
+
+NadirClaw integrates Headroom only through its public Python API
+(`headroom.compress`); no Headroom source code is copied or vendored into this
+project. A full copy of the Apache License 2.0 is available at
+https://www.apache.org/licenses/LICENSE-2.0 and is distributed with the
+`headroom-ai` package when installed.
diff --git a/benchmarks/optimize_real_data.py b/benchmarks/optimize_real_data.py
@@ -0,0 +1,121 @@
+"""Real-data benchmark: optimizer backends on public coding + chat datasets.
+
+- Chat: allenai/WildChat-1M (real multi-turn user<->assistant conversations)
+- Coding/tools: glaiveai/glaive-function-calling-v2 (tool schemas + function calls + JSON)
+
+Compares: native-safe (lossless, ships today), Pro-aggressive (native ceiling),
+headroom (new opt-in backend). Single tiktoken estimator for all => fair.
+"""
+import json, os, re, sys, time, collections, urllib.request
+
+# Resolve the NadirClaw repo root from this file, and the sibling Nadir package.
+_HERE = os.path.dirname(os.path.abspath(__file__))
+_NADIRCLAW = os.path.dirname(_HERE)
+sys.path.insert(0, _NADIRCLAW)
+_NADIR = os.path.join(os.path.dirname(_NADIRCLAW), "Nadir")
+if os.path.isdir(_NADIR):
+    sys.path.insert(0, _NADIR)
+
+import nadirclaw.optimize as claw
+try:
+    import nadir.optimize as pro
+except Exception:                       # Nadir Pro not on path — fall back to native
+    pro = claw
+
+est = claw._estimate_tokens_messages
+
+N = 200          # conversations per dataset
+CACHE = os.environ.get("BENCH_CACHE_DIR", "/tmp")
+
+
+def _fetch(dataset, config, split, dest, total=N):
+    """Fetch rows from the HF datasets-server (no full dataset download). Cached to disk."""
+    if os.path.exists(dest):
+        return
+    rows = []
+    for off in range(0, total, 100):
+        url = (f"https://datasets-server.huggingface.co/rows?dataset={dataset}"
+               f"&config={config}&split={split}&offset={off}&length=100")
+        for _ in range(3):
+            try:
+                with urllib.request.urlopen(url, timeout=40) as r:
+                    rows += [x["row"] for x in json.load(r).get("rows", [])]
+                break
+            except Exception:
+                time.sleep(2)
+    json.dump(rows, open(dest, "w"))
+
+
+_WILDCHAT = os.path.join(CACHE, "ds_wildchat.json")
+_GLAIVE = os.path.join(CACHE, "ds_glaive.json")
+_fetch("allenai/WildChat-1M", "default", "train", _WILDCHAT)
+_fetch("glaiveai/glaive-function-calling-v2", "default", "train", _GLAIVE)
+
+
+def load_wildchat():
+    rows = json.load(open(_WILDCHAT))[:N]
+    convs = []
+    for r in rows:
+        msgs = [{"role": t.get("role", "user"), "content": t.get("content") or ""}
+                for t in (r.get("conversation") or []) if isinstance(t, dict)]
+        msgs = [m for m in msgs if isinstance(m["content"], str) and m["content"]]
+        if len(msgs) >= 2:
+            convs.append(msgs)
+    return convs
+
+
+def load_glaive():
+    rows = json.load(open(_GLAIVE))[:N]
+    convs = []
+    marker = re.compile(r"(USER:|ASSISTANT:|FUNCTION RESPONSE:)", re.I)
+    rolemap = {"USER": "user", "ASSISTANT": "assistant", "FUNCTION RESPONSE": "tool"}
+    for r in rows:
+        sysm = (r.get("system") or "").strip()
+        if sysm.upper().startswith("SYSTEM:"):
+            sysm = sysm[7:].strip()
+        msgs = [{"role": "system", "content": sysm}] if sysm else []
+        chat = r.get("chat") or ""
+        parts = marker.split(chat)
+        # parts: ['', 'USER:', ' ...', 'ASSISTANT:', ' ...', ...]
+        i = 1
+        while i < len(parts) - 0:
+            lab = parts[i].rstrip(":").upper()
+            content = parts[i + 1].strip() if i + 1 < len(parts) else ""
+            if lab in rolemap and content:
+                msgs.append({"role": rolemap[lab], "content": content})
+            i += 2
+        if len(msgs) >= 2:
+            convs.append(msgs)
+    return convs
+
+
+def bench(convs, runners):
+    out = {name: [0, 0] for name in runners}          # name -> [orig, after]
+    transforms = {name: collections.Counter() for name in runners}
+    for msgs in convs:
+        for name, fn in runners.items():
+            r = fn([{**m} for m in msgs])
+            out[name][0] += r.original_tokens
+            out[name][1] += r.optimized_tokens
+            for t in r.optimizations_applied:
+                transforms[name][t.split(":")[1] if t.startswith("headroom:") else t] += 1
+    return out, transforms
+
+
+RUNNERS = {
+    "native-safe":    lambda m: claw.optimize_messages(m, mode="safe", backend="native"),
+    "pro-aggressive": lambda m: pro.optimize_messages(m, mode="aggressive", backend="native"),
+    "headroom":       lambda m: claw.optimize_messages(m, mode="safe", backend="headroom"),
+}
+
+for label, loader in [("CHAT — WildChat-1M", load_wildchat), ("CODING/TOOLS — glaive-function-calling-v2", load_glaive)]:
+    convs = loader()
+    t0 = time.time()
+    res, tf = bench(convs, RUNNERS)
+    base = res["native-safe"][0]
+    print(f"\n### {label}  ({len(convs)} conversations, {base:,} raw tokens, {time.time()-t0:.0f}s)")
+    print(f"{'backend':<18}{'after':>10}{'saved':>9}{'%':>7}   top transforms")
+    for name in RUNNERS:
+        o, a = res[name]
+        top = ", ".join(f"{k}:{v}" for k, v in tf[name].most_common(4))
+        print(f"{name:<18}{a:>10,}{o-a:>9,}{100*(o-a)/max(1,o):>6.1f}%   {top}")
diff --git a/docs/context-optimize-savings.md b/docs/context-optimize-savings.md
@@ -37,6 +37,7 @@ Combined with smart routing, NadirClaw now saves in two ways:
 - **Tool schema deduplication** — Agent frameworks often re-send the full tool schema with every turn. NadirClaw keeps the first occurrence and replaces repeats with a short reference.
 - **Chat history trimming** — Long conversations accumulate tokens that are far from the current task. Trimming to recent turns (default: 40) keeps context relevant and cheap.
 - **Whitespace normalization** — Log dumps, stack traces, and verbose output contain runs of blank lines and spaces that carry no semantic value.
+- **Columnar JSON-array packing** (`json_array_pack`, aggressive mode) — Large arrays of same-keyed objects (DB query results, API list responses, large tool outputs) repeat every key on every row. Packing them into a header (`⟦cols=[...]⟧`) plus one value-array per row emits each key once. Information-lossless and deterministically reversible, but not byte-identical JSON, so it runs in **aggressive** mode only. On a 100-row homogeneous array this reaches ~68% vs pretty-printed JSON (vs ~45% for `json_minify` alone).
 
 ## Projected Monthly Savings (Opus 4.6)
 
@@ -56,6 +57,9 @@ All safe-mode transforms are deterministic and lossless:
 
 - JSON values roundtrip exactly (parse + compact re-serialize)
 - Code blocks inside fences (```) are never modified
+- **Leading indentation is preserved**, so raw (unfenced) source code — e.g. file-read
+  tool outputs — stays syntactically valid. Whitespace normalization only collapses
+  *interior* multi-spaces and excess blank lines, never indentation.
 - URLs are preserved character-for-character
 - Unicode and emoji roundtrip correctly
 - Deeply nested structures are handled without data loss
@@ -76,3 +80,100 @@ NADIRCLAW_OPTIMIZE=safe nadirclaw serve
 # Dry-run on a file
 nadirclaw optimize payload.json --mode safe --format json
 ```
+
+## Backends: native (default) vs headroom
+
+The optimizer has a pluggable backend, selected independently of the `off|safe|aggressive`
+mode. The mode still decides *how hard* to compress; the backend decides *who* runs it.
+
+| Backend | Default | Engine | Extra capabilities |
+|---|---|---|---|
+| `native` | ✅ | Built-in stdlib pipeline (this document) | None — pure Python, no extra deps |
+| `headroom` | opt-in | [Headroom](https://github.com/chopratejas/headroom) (Apache-2.0) | Statistical JSON-array crushing (SmartCrusher), AST-aware code compression, content-type routing |
+
+`headroom` delegates to the optional [`headroom-ai`](https://pypi.org/project/headroom-ai/)
+package. It ships **installed by default with Nadir Pro** but stays **inactive** until you
+select it. In open-source NadirClaw it is an opt-in extra:
+
+```bash
+pip install "nadirclaw[headroom]"
+```
+
+Activate it:
+
+```bash
+# Server-wide
+NADIRCLAW_OPTIMIZE=safe NADIRCLAW_OPTIMIZE_BACKEND=headroom nadirclaw serve
+
+# Per-request override (in the request body)
+{"model": "auto", "optimize": "safe", "optimize_backend": "headroom", "messages": [...]}
+```
+
+Safety and fallback:
+
+- If `headroom-ai` is not installed (or raises), the optimizer **transparently falls back
+  to `native`** and logs a one-time warning. Requests never fail because of the backend.
+- Token-savings metrics are always recomputed with NadirClaw's own estimator, so reported
+  numbers stay consistent across backends (Savings/Billing math is unaffected).
+- Headroom's ML text compressor (Kompress) downloads a HuggingFace model on first use, so
+  it is kept **disabled** by default. Opt in with `NADIRCLAW_HEADROOM_KOMPRESS=on`.
+- The fastest Headroom compressors (SmartCrusher etc.) are a compiled Rust extension bundled
+  in the prebuilt wheels. On source installs without the wheel they simply don't run, and
+  Headroom fails open — output is still correct, just less compressed.
+
+Attribution for the Apache-2.0 dependency lives in
+[`THIRD_PARTY_NOTICES.md`](../THIRD_PARTY_NOTICES.md).
+
+## Progressive (staged) compression
+
+`compress_progressive()` escalates through compression stages and **stops as soon as a
+token budget is met** — so you only pay the cost (and fidelity risk) of heavier compression
+when lighter stages aren't enough. Headroom is wired in as the middle/late tiers.
+
+The ladder, cheapest/safest first:
+
+| Stage | What runs | Loss | Needs |
+|---|---|---|---|
+| 1. `native_safe` | system/tool dedup, json minify, whitespace | lossless | — |
+| 2. `native_aggressive` | + columnar packing, semantic dedup, Pro transforms | lossless-to-semantic | — |
+| 3. `headroom_structural` | Headroom content compressors (SmartCrusher, LogCompressor, …) | high-fidelity | `headroom-ai` |
+| 4. `headroom_ml` | Headroom Kompress (ML token-dropping on prose) | lossy | `headroom-ai` + `allow_lossy` |
+
+Rules:
+
+- With **no `target_tokens`**, the ladder stops after `native_aggressive` — Headroom and the
+  lossy ML stage are never reached. Default behaviour stays dependency-free and lossless.
+- The Headroom stages are **skipped silently** when `headroom-ai` is not installed.
+- `headroom_ml` (lossy) only runs when `allow_lossy=True`.
+- Chat-history trimming always runs last as a final backstop.
+
+```python
+from nadirclaw.optimize import compress_progressive   # or nadir.optimize for Pro
+
+result = compress_progressive(
+    messages,
+    target_tokens=180_000,     # e.g. the model's context window
+    allow_lossy=False,         # set True to permit the lossy ML stage
+    max_stage="headroom_structural",
+)
+# result.optimizations_applied is prefixed with stage:<name> markers that ran
+```
+
+Enable it on the server — `progressive` is just a value of the single `optimize`
+control, alongside `off` / `safe` / `aggressive`:
+
+```bash
+# off | safe | aggressive | progressive  (off = compression disabled)
+NADIRCLAW_OPTIMIZE=progressive \
+NADIRCLAW_OPTIMIZE_TARGET_TOKENS=180000 \
+NADIRCLAW_OPTIMIZE_MAX_STAGE=headroom_structural \
+nadirclaw serve
+
+# equivalently: nadirclaw serve --optimize progressive
+# per-request:  {"optimize": "progressive", "messages": [...]}
+# turn compression off:  {"optimize": "off", ...}
+```
+
+On a logs+prose payload where native compression yields ~0%, escalating to
+`headroom_structural` reached ~90% — the escalation only spends the Headroom budget when
+native genuinely can't deliver.