Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -1035,7 +1035,7 @@ Options:
--complex-model TEXT Model for complex prompts
--models TEXT Comma-separated model list (legacy)
--token TEXT Auth token
--optimize [off|safe|aggressive] Context optimization mode (default: off)
--optimize [off|safe|aggressive|progressive] Context compression: off | safe | aggressive | progressive (default: off)
--verbose Enable debug logging
--log-raw Log full raw requests and responses to JSONL
```
Expand Down Expand Up @@ -1420,8 +1420,14 @@ Auth is disabled by default (local-only). Set `NADIRCLAW_AUTH_TOKEN` to require
| `NADIRCLAW_CONFIDENCE_THRESHOLD` | `0.06` | Classification threshold (lower = more complex) |
| `NADIRCLAW_PORT` | `8856` | Server port |
| `NADIRCLAW_LOG_DIR` | `~/.nadirclaw/logs` | Log directory |
| `NADIRCLAW_OPTIMIZE` | `off` | Context optimization mode: `off`, `safe` (lossless), `aggressive` (future) |
| `NADIRCLAW_OPTIMIZE` | `off` | Context compression: `off` (disabled), `safe` (lossless), `aggressive`, or `progressive` (staged ladder that escalates to Headroom). `off` is the master on/off switch |
| `NADIRCLAW_OPTIMIZE_MAX_TURNS` | `40` | Max conversation turns to keep when trimming history |
| `NADIRCLAW_OPTIMIZE_BACKEND` | `native` | Optimizer backend: `native` (built-in) or `headroom` (needs `pip install nadirclaw[headroom]`; falls back to native if absent). See [savings analysis](docs/context-optimize-savings.md#backends-native-default-vs-headroom) |
| `NADIRCLAW_HEADROOM_KOMPRESS` | `off` | When backend is `headroom`, enable Kompress ML text compression (downloads a HuggingFace model on first use) |
| `NADIRCLAW_OPTIMIZE_PROGRESSIVE` | `off` | Legacy alias for `NADIRCLAW_OPTIMIZE=progressive` — forces the [progressive ladder](docs/context-optimize-savings.md#progressive-staged-compression) regardless of mode. Prefer setting `NADIRCLAW_OPTIMIZE=progressive` |
| `NADIRCLAW_OPTIMIZE_TARGET_TOKENS` | _(unset)_ | Token budget for progressive compression (e.g. the model's context window). Unset → native stages only |
| `NADIRCLAW_OPTIMIZE_MAX_STAGE` | `headroom_structural` | Cap on the progressive ladder: `native_safe`, `native_aggressive`, `headroom_structural`, or `headroom_ml` |
| `NADIRCLAW_OPTIMIZE_ALLOW_LOSSY` | `off` | Permit the lossy ML prose stage (`headroom_ml`) in progressive compression |
| `NADIRCLAW_LOG_RAW` | `false` | Log full raw requests and responses (`true`/`false`) |
| `NADIRCLAW_MODELS` | `openai-codex/gpt-5.3-codex,gemini-3-flash-preview` | Legacy model list (fallback if tier vars not set) |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | *(empty — disabled)* | OpenTelemetry collector endpoint (enables tracing) |
Expand Down
19 changes: 19 additions & 0 deletions THIRD_PARTY_NOTICES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Third-Party Notices

NadirClaw is MIT-licensed. It can optionally use the following third-party
components, declared as opt-in extras. Their licenses and attributions are
reproduced here.

## headroom-ai

- **Used by:** the optional `headroom` optimizer backend
(`NADIRCLAW_OPTIMIZE_BACKEND=headroom`), installed via `pip install nadirclaw[headroom]`.
- **Project:** Headroom — https://github.com/chopratejas/headroom
- **License:** Apache License 2.0
- **NOTICE:** Headroom, Copyright 2025 Headroom Contributors.

NadirClaw integrates Headroom only through its public Python API
(`headroom.compress`); no Headroom source code is copied or vendored into this
project. A full copy of the Apache License 2.0 is available at
https://www.apache.org/licenses/LICENSE-2.0 and is distributed with the
`headroom-ai` package when installed.
121 changes: 121 additions & 0 deletions benchmarks/optimize_real_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
"""Real-data benchmark: optimizer backends on public coding + chat datasets.

- Chat: allenai/WildChat-1M (real multi-turn user<->assistant conversations)
- Coding/tools: glaiveai/glaive-function-calling-v2 (tool schemas + function calls + JSON)

Compares: native-safe (lossless, ships today), Pro-aggressive (native ceiling),
headroom (new opt-in backend). Single tiktoken estimator for all => fair.
"""
import json, os, re, sys, time, collections, urllib.request

# Resolve the NadirClaw repo root from this file, and the sibling Nadir package.
_HERE = os.path.dirname(os.path.abspath(__file__))
_NADIRCLAW = os.path.dirname(_HERE)
sys.path.insert(0, _NADIRCLAW)
_NADIR = os.path.join(os.path.dirname(_NADIRCLAW), "Nadir")
if os.path.isdir(_NADIR):
sys.path.insert(0, _NADIR)

import nadirclaw.optimize as claw
try:
import nadir.optimize as pro
except Exception: # Nadir Pro not on path — fall back to native
pro = claw

est = claw._estimate_tokens_messages

N = 200 # conversations per dataset
CACHE = os.environ.get("BENCH_CACHE_DIR", "/tmp")


def _fetch(dataset, config, split, dest, total=N):
"""Fetch rows from the HF datasets-server (no full dataset download). Cached to disk."""
if os.path.exists(dest):
return
rows = []
for off in range(0, total, 100):
url = (f"https://datasets-server.huggingface.co/rows?dataset={dataset}"
f"&config={config}&split={split}&offset={off}&length=100")
for _ in range(3):
try:
with urllib.request.urlopen(url, timeout=40) as r:
rows += [x["row"] for x in json.load(r).get("rows", [])]
break
except Exception:
time.sleep(2)
json.dump(rows, open(dest, "w"))


_WILDCHAT = os.path.join(CACHE, "ds_wildchat.json")
_GLAIVE = os.path.join(CACHE, "ds_glaive.json")
_fetch("allenai/WildChat-1M", "default", "train", _WILDCHAT)
_fetch("glaiveai/glaive-function-calling-v2", "default", "train", _GLAIVE)


def load_wildchat():
rows = json.load(open(_WILDCHAT))[:N]
convs = []
for r in rows:
msgs = [{"role": t.get("role", "user"), "content": t.get("content") or ""}
for t in (r.get("conversation") or []) if isinstance(t, dict)]
msgs = [m for m in msgs if isinstance(m["content"], str) and m["content"]]
if len(msgs) >= 2:
convs.append(msgs)
return convs


def load_glaive():
rows = json.load(open(_GLAIVE))[:N]
convs = []
marker = re.compile(r"(USER:|ASSISTANT:|FUNCTION RESPONSE:)", re.I)
rolemap = {"USER": "user", "ASSISTANT": "assistant", "FUNCTION RESPONSE": "tool"}
for r in rows:
sysm = (r.get("system") or "").strip()
if sysm.upper().startswith("SYSTEM:"):
sysm = sysm[7:].strip()
msgs = [{"role": "system", "content": sysm}] if sysm else []
chat = r.get("chat") or ""
parts = marker.split(chat)
# parts: ['', 'USER:', ' ...', 'ASSISTANT:', ' ...', ...]
i = 1
while i < len(parts) - 0:
lab = parts[i].rstrip(":").upper()
content = parts[i + 1].strip() if i + 1 < len(parts) else ""
if lab in rolemap and content:
msgs.append({"role": rolemap[lab], "content": content})
i += 2
if len(msgs) >= 2:
convs.append(msgs)
return convs


def bench(convs, runners):
out = {name: [0, 0] for name in runners} # name -> [orig, after]
transforms = {name: collections.Counter() for name in runners}
for msgs in convs:
for name, fn in runners.items():
r = fn([{**m} for m in msgs])
out[name][0] += r.original_tokens
out[name][1] += r.optimized_tokens
for t in r.optimizations_applied:
transforms[name][t.split(":")[1] if t.startswith("headroom:") else t] += 1
return out, transforms


RUNNERS = {
"native-safe": lambda m: claw.optimize_messages(m, mode="safe", backend="native"),
"pro-aggressive": lambda m: pro.optimize_messages(m, mode="aggressive", backend="native"),
"headroom": lambda m: claw.optimize_messages(m, mode="safe", backend="headroom"),
}

for label, loader in [("CHAT — WildChat-1M", load_wildchat), ("CODING/TOOLS — glaive-function-calling-v2", load_glaive)]:
convs = loader()
t0 = time.time()
res, tf = bench(convs, RUNNERS)
base = res["native-safe"][0]
print(f"\n### {label} ({len(convs)} conversations, {base:,} raw tokens, {time.time()-t0:.0f}s)")
print(f"{'backend':<18}{'after':>10}{'saved':>9}{'%':>7} top transforms")
for name in RUNNERS:
o, a = res[name]
top = ", ".join(f"{k}:{v}" for k, v in tf[name].most_common(4))
print(f"{name:<18}{a:>10,}{o-a:>9,}{100*(o-a)/max(1,o):>6.1f}% {top}")
101 changes: 101 additions & 0 deletions docs/context-optimize-savings.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Combined with smart routing, NadirClaw now saves in two ways:
- **Tool schema deduplication** — Agent frameworks often re-send the full tool schema with every turn. NadirClaw keeps the first occurrence and replaces repeats with a short reference.
- **Chat history trimming** — Long conversations accumulate tokens that are far from the current task. Trimming to recent turns (default: 40) keeps context relevant and cheap.
- **Whitespace normalization** — Log dumps, stack traces, and verbose output contain runs of blank lines and spaces that carry no semantic value.
- **Columnar JSON-array packing** (`json_array_pack`, aggressive mode) — Large arrays of same-keyed objects (DB query results, API list responses, large tool outputs) repeat every key on every row. Packing them into a header (`⟦cols=[...]⟧`) plus one value-array per row emits each key once. Information-lossless and deterministically reversible, but not byte-identical JSON, so it runs in **aggressive** mode only. On a 100-row homogeneous array this reaches ~68% vs pretty-printed JSON (vs ~45% for `json_minify` alone).

## Projected Monthly Savings (Opus 4.6)

Expand All @@ -56,6 +57,9 @@ All safe-mode transforms are deterministic and lossless:

- JSON values roundtrip exactly (parse + compact re-serialize)
- Code blocks inside fences (```) are never modified
- **Leading indentation is preserved**, so raw (unfenced) source code — e.g. file-read
tool outputs — stays syntactically valid. Whitespace normalization only collapses
*interior* multi-spaces and excess blank lines, never indentation.
- URLs are preserved character-for-character
- Unicode and emoji roundtrip correctly
- Deeply nested structures are handled without data loss
Expand All @@ -76,3 +80,100 @@ NADIRCLAW_OPTIMIZE=safe nadirclaw serve
# Dry-run on a file
nadirclaw optimize payload.json --mode safe --format json
```

## Backends: native (default) vs headroom

The optimizer has a pluggable backend, selected independently of the `off|safe|aggressive`
mode. The mode still decides *how hard* to compress; the backend decides *who* runs it.

| Backend | Default | Engine | Extra capabilities |
|---|---|---|---|
| `native` | ✅ | Built-in stdlib pipeline (this document) | None — pure Python, no extra deps |
| `headroom` | opt-in | [Headroom](https://github.com/chopratejas/headroom) (Apache-2.0) | Statistical JSON-array crushing (SmartCrusher), AST-aware code compression, content-type routing |

`headroom` delegates to the optional [`headroom-ai`](https://pypi.org/project/headroom-ai/)
package. It ships **installed by default with Nadir Pro** but stays **inactive** until you
select it. In open-source NadirClaw it is an opt-in extra:

```bash
pip install "nadirclaw[headroom]"
```

Activate it:

```bash
# Server-wide
NADIRCLAW_OPTIMIZE=safe NADIRCLAW_OPTIMIZE_BACKEND=headroom nadirclaw serve

# Per-request override (in the request body)
{"model": "auto", "optimize": "safe", "optimize_backend": "headroom", "messages": [...]}
```

Safety and fallback:

- If `headroom-ai` is not installed (or raises), the optimizer **transparently falls back
to `native`** and logs a one-time warning. Requests never fail because of the backend.
- Token-savings metrics are always recomputed with NadirClaw's own estimator, so reported
numbers stay consistent across backends (Savings/Billing math is unaffected).
- Headroom's ML text compressor (Kompress) downloads a HuggingFace model on first use, so
it is kept **disabled** by default. Opt in with `NADIRCLAW_HEADROOM_KOMPRESS=on`.
- The fastest Headroom compressors (SmartCrusher etc.) are a compiled Rust extension bundled
in the prebuilt wheels. On source installs without the wheel they simply don't run, and
Headroom fails open — output is still correct, just less compressed.

Attribution for the Apache-2.0 dependency lives in
[`THIRD_PARTY_NOTICES.md`](../THIRD_PARTY_NOTICES.md).

## Progressive (staged) compression

`compress_progressive()` escalates through compression stages and **stops as soon as a
token budget is met** — so you only pay the cost (and fidelity risk) of heavier compression
when lighter stages aren't enough. Headroom is wired in as the middle/late tiers.

The ladder, cheapest/safest first:

| Stage | What runs | Loss | Needs |
|---|---|---|---|
| 1. `native_safe` | system/tool dedup, json minify, whitespace | lossless | — |
| 2. `native_aggressive` | + columnar packing, semantic dedup, Pro transforms | lossless-to-semantic | — |
| 3. `headroom_structural` | Headroom content compressors (SmartCrusher, LogCompressor, …) | high-fidelity | `headroom-ai` |
| 4. `headroom_ml` | Headroom Kompress (ML token-dropping on prose) | lossy | `headroom-ai` + `allow_lossy` |

Rules:

- With **no `target_tokens`**, the ladder stops after `native_aggressive` — Headroom and the
lossy ML stage are never reached. Default behaviour stays dependency-free and lossless.
- The Headroom stages are **skipped silently** when `headroom-ai` is not installed.
- `headroom_ml` (lossy) only runs when `allow_lossy=True`.
- Chat-history trimming always runs last as a final backstop.

```python
from nadirclaw.optimize import compress_progressive # or nadir.optimize for Pro

result = compress_progressive(
messages,
target_tokens=180_000, # e.g. the model's context window
allow_lossy=False, # set True to permit the lossy ML stage
max_stage="headroom_structural",
)
# result.optimizations_applied is prefixed with stage:<name> markers that ran
```

Enable it on the server — `progressive` is just a value of the single `optimize`
control, alongside `off` / `safe` / `aggressive`:

```bash
# off | safe | aggressive | progressive (off = compression disabled)
NADIRCLAW_OPTIMIZE=progressive \
NADIRCLAW_OPTIMIZE_TARGET_TOKENS=180000 \
NADIRCLAW_OPTIMIZE_MAX_STAGE=headroom_structural \
nadirclaw serve

# equivalently: nadirclaw serve --optimize progressive
# per-request: {"optimize": "progressive", "messages": [...]}
# turn compression off: {"optimize": "off", ...}
```

On a logs+prose payload where native compression yields ~0%, escalating to
`headroom_structural` reached ~90% — the escalation only spends the Headroom budget when
native genuinely can't deliver.
Loading