Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,7 @@ docs/humanswithai-mcp-stack.md
.cursor/rules/humanswithai-mcp-autopilot.mdc
.windsurf/mcp.json
.windsurf/rules/humanswithai-mcp-autopilot.md

# Python
__pycache__/
*.pyc
130 changes: 130 additions & 0 deletions act-receipts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# act-receipts β€” cache-friendly action receipts for browser MCPs

> Companion code for [*We measured our own scraper-stack. The receipt design works on controlled jitter, but real prod is harder.*](https://gregshevchenko.com/research/mcp-stack-token-economy-part-2/)
>
> Continuation of [part 1](https://gregshevchenko.com/research/mcp-stack-token-economy/) (token economy of a 17-MCP local-first stack).

A **cache-friendly action receipt** is what a browser MCP returns after performing a user-shaped action (click, type, submit, scroll), instead of the full DOM. The receipt's byte representation stays stable across calls when the page region the agent acted on didn't change β€” so Anthropic's 5-minute prompt cache can hit on the next agent turn.

This directory ships:

- **JSON Schema** for `act_receipt.v1` (`schema/act_receipt_v1.json`)
- **Human-readable spec** (`schema/act_receipt_v1.md`)
- **Python reference implementation** (`python/act_receipts.py`) β€” validate / canonicalize / hash / score
- **JavaScript reference implementation** (`js/canonical_bytes.mjs`) β€” byte-equivalent port
- **Cross-runtime equivalence tests** (`python/tests/` + `js/canonical_bytes.test.mjs`) β€” same SHA-256 across Python + Node
- **Three A/B scenarios** (`scenarios/`) β€” the iana.org / `/test/jitter` / Hacker News probes from the article
- **`detect_selector_miss_artifact()` guard** (`python/artifact_guard.py`) β€” useful to anyone running LLM evals against CSS-selectored regions

## Why this is here, not in a separate repo

This is a continuation of the two-axis MCP-stack framework introduced in part 1. The schema + algorithm + guard pattern are **not the moat** β€” they're the public-facing methodology. The actual measurement engine, scraper-stack tiers, and production deploy live elsewhere; this directory exposes only what's already public in the article.

## Quick example (Python)

```python
from act_receipts import (
assemble_receipt,
canonical_receipt_bytes,
canonical_sha256,
cache_friendly_score,
)

# After your browser MCP performs an action, build a receipt:
receipt = assemble_receipt(
action={"type": "click", "selector": "a[href='/about']"},
pre_url="https://example.com/",
pre_dom_region_html="<header>...</header>",
post_url="https://example.com/about",
post_dom_region_html="<header>...</header>", # same region content β†’ same hash
navigated=True,
)

# The cache key is the SHA-256 of the canonical bytes:
print(canonical_sha256(receipt))
# Same semantic action on the same page β†’ same hash β†’ prompt-cache hit
```

## Quick example (JavaScript)

```js
import { canonicalBytes, canonicalSha256 } from "./js/canonical_bytes.mjs";

const receipt = {
schema_version: "scraper-mcp.act_receipt.v1",
action: { type: "click", selector: "a[href='/about']" },
pre_state: { url: "https://example.com/", dom_region_hash: "sha256:abc..." },
post_state: {
url: "https://example.com/about",
dom_region_hash: "sha256:def...",
changed: true,
stable: true,
navigated: true,
},
errors: { console: [], network: [], selector_not_found: false, timeout: false, action_failed: null },
observability: { tier_used: "patchright", duration_ms: 234 }, // stripped before hashing
};

console.log(canonicalSha256(receipt));
// Same semantic receipt β†’ same hash as Python's canonical_sha256()
```

## Run the equivalence tests

```bash
# Python β€” uses stdlib only
cd python
python3 -m unittest tests.test_canonical_bytes -v

# JavaScript β€” Node 20+
cd js
node --test canonical_bytes.test.mjs
```

Both suites load the same 4 fixtures (`tests/fixtures/cross_runtime_equivalence.json`) and assert byte-identical canonical output + matching SHA-256.

## Use the artifact guard

```python
from artifact_guard import detect_selector_miss_artifact

# treatment_runs is a list of dicts with at least 'post_region_size_bytes'
is_artifact, reason = detect_selector_miss_artifact(treatment_runs)
if is_artifact:
print(f"REJECT: {reason}")
```

The guard exists because of an incident: a first A/B run on Hacker News reported +77.8 percentage points "win" that turned out to be a CSS selector miss. `table.itemlist` doesn't exist on HN's homepage β†’ both control and treatment hashed the same empty region β†’ false 100% byte-stability. A permanent guard in the harness now rejects this: if `post_region_size_bytes < 200` across all N runs, the selector probably missed.

Useful to anyone running LLM evals against CSS-selectored regions.

## The three measurement scenarios

The article ships three live A/B measurements. The scenario JSONs are reusable as-is:

| Scenario | Target | What it tells you |
|---|---|---|
| `AB1-iana-click.json` | iana.org (static) | Phase-0 strip already perfect β†’ 0pp delta. **Invariant proof** that JS + Python + Camoufox produce the same SHA-256. |
| `AB2-jitter-fixture.json` | `/test/jitter` (synthetic, see `examples/`) | Controlled DOM noise β†’ **+80pp delta**. Mechanism works. |
| `AB3-hn-frontpage.json` | news.ycombinator.com | Mixed signal (+5pp unique-count, -25pp modal artifact, **+3017ms wall-time**). Real-prod isn't structured like the synthetic. |

Run them against your own browser-MCP fork to compare measurements.

## License

MIT (same as the repo root).

## SSOT

- Canonical article: https://gregshevchenko.com/research/mcp-stack-token-economy-part-2/
- Part 1 article: https://gregshevchenko.com/research/mcp-stack-token-economy/
- Repo root: https://github.com/g-shevchenko/mcp-token-savers

## Credits

- [LakshmanTurlapati/FSB](https://github.com/LakshmanTurlapati/FSB) (BSL 1.1) β€” architectural inspiration for the action-receipt pattern. We adopted the shape, not the code; this implementation is original.
- [u/pquattro on r/ClaudeAI](https://www.reddit.com/r/ClaudeAI/comments/1tn6cey/i_measured_my_claude_code_mcp_stack_on_two_axes/) β€” feedback on part 1's cache-friendliness framing that pushed us to measure the browser-MCP layer.

## Status

Reference implementation, **not** a published package. Copy what you need; the article documents the boundary of applicability. **Default-on stays OFF** in our own production stack β€” the wall-time cost on real-prod targets is too steep for a global default.
97 changes: 97 additions & 0 deletions act-receipts/examples/jitter_endpoint.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
"""Minimal /test/jitter endpoint β€” reproduces AB2 from the article.

A controlled-jitter target: emits one fresh `data-time="<unix-ms>"`
attribute on every request, with the rest of the page byte-stable.
The whole point is to let you run AB2 against a known-jitter target
to see act_receipt.v1's dom_region_hash noise-stripping in action.

Usage:
pip install starlette uvicorn
python3 jitter_endpoint.py
# then in another shell:
curl http://127.0.0.1:3030/test/jitter | head

Run AB2 against this endpoint with your own browser-MCP fork.

Standard-library-only fallback if you don't want Starlette: see
`jitter_endpoint_stdlib.py` below β€” but it's slightly less convenient
for a normal dev loop.

License: MIT
"""
from __future__ import annotations

import time

PAGE_TEMPLATE = """<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8" />
<title>/test/jitter</title>
</head>
<body>
<h1 id="hello">Hello</h1>
<div data-time="{ms}" data-stable="stable content"></div>
<p>The data-time attribute above changes on every request. Everything
else is byte-stable. A receipt's dom_region_hash strips data-time before
hashing, so 5/5 receipts produce byte-identical canonical form. A raw
HTML capture sees 5 different bytes -> 5 different hashes.</p>
</body>
</html>
"""


def render() -> str:
return PAGE_TEMPLATE.format(ms=int(time.time() * 1000))


def starlette_app():
"""Run with: uvicorn jitter_endpoint:starlette_app --port 3030"""
try:
from starlette.applications import Starlette
from starlette.responses import HTMLResponse
from starlette.routing import Route
except ImportError as exc:
raise SystemExit(
"starlette is required for the Starlette adapter β€” "
"`pip install starlette uvicorn`, or use the stdlib fallback below."
) from exc

async def jitter(request):
return HTMLResponse(render())

return Starlette(routes=[Route("/test/jitter", jitter)])


def stdlib_main(port: int = 3030):
"""Standard-library fallback β€” no dependencies."""
import http.server
import socketserver

class Handler(http.server.BaseHTTPRequestHandler):
def do_GET(self):
if self.path == "/test/jitter":
body = render().encode("utf-8")
self.send_response(200)
self.send_header("Content-Type", "text/html; charset=utf-8")
self.send_header("Content-Length", str(len(body)))
self.end_headers()
self.wfile.write(body)
else:
self.send_response(404)
self.end_headers()

def log_message(self, *args, **kwargs):
return # silent

with socketserver.TCPServer(("127.0.0.1", port), Handler) as httpd:
print(f"jitter endpoint running on http://127.0.0.1:{port}/test/jitter")
try:
httpd.serve_forever()
except KeyboardInterrupt:
print("\nshutting down")


if __name__ == "__main__":
# Default: stdlib server (no extra deps required).
stdlib_main()
133 changes: 133 additions & 0 deletions act-receipts/js/canonical_bytes.mjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
/**
* scraper-mcp.act_receipt.v1 β€” JavaScript reference implementation.
*
* Companion to https://gregshevchenko.com/research/mcp-stack-token-economy-part-2/
*
* Cross-runtime equivalent of `../python/act_receipts.py`. Produces the
* SAME canonical bytes and SAME SHA-256 for the same receipt object.
*
* If JS and Python ever disagree on a single byte, the whole cache-
* friendliness design breaks β€” verified in `canonical_bytes.test.mjs`
* against shared golden fixtures.
*
* Algorithm (mirrors Python):
* 1. Take a dict copy of the receipt
* 2. Drop the top-level `observability` subtree
* 3. Drop `dom_region_size_bytes` from `pre_state` and `post_state`
* 4. `json.dumps(sort_keys=True, ensure_ascii=False).encode("utf-8")`
*
* JSON serialization parity:
* - sort_keys=True -> recursive sortedJsonStringify below
* - ensure_ascii=False -> JSON.stringify is non-escaping for
* non-ASCII by default (matches Python's
* ensure_ascii=False β€” both emit raw UTF-8).
* - separators -> Python's json.dumps default is
* (", ", ": ") with spaces; JS
* JSON.stringify default is ",", ":" without
* spaces. We force JS to match Python by
* emitting `, ` and `: ` manually.
*
* License: MIT
*/
import { createHash } from "node:crypto";

const STRIPPED_TOP_KEYS = new Set(["observability"]);
const STATE_KEYS = new Set(["pre_state", "post_state"]);
const STRIPPED_STATE_KEYS = new Set(["dom_region_size_bytes"]);

/**
* Return a copy of `receipt` with jitter fields stripped:
* - observability subtree gone
* - dom_region_size_bytes gone from pre_state + post_state
*/
function stripForCanonical(receipt) {
const cleaned = {};
for (const [key, value] of Object.entries(receipt)) {
if (STRIPPED_TOP_KEYS.has(key)) continue;
if (
STATE_KEYS.has(key) &&
value !== null &&
typeof value === "object" &&
!Array.isArray(value)
) {
const sub = {};
for (const [k, v] of Object.entries(value)) {
if (!STRIPPED_STATE_KEYS.has(k)) sub[k] = v;
}
cleaned[key] = sub;
} else {
cleaned[key] = value;
}
}
return cleaned;
}

/**
* Recursive JSON stringify with sorted object keys and Python's default
* separators ("," + " " and ":" + " "). Mirrors
* `json.dumps(obj, sort_keys=True, ensure_ascii=False)`.
*/
function sortedJsonStringify(value) {
if (value === null) return "null";
if (typeof value === "boolean") return value ? "true" : "false";
if (typeof value === "number") {
return JSON.stringify(value);
}
if (typeof value === "string") {
// JSON.stringify quotes correctly and leaves non-ASCII raw, which
// matches Python ensure_ascii=False.
return JSON.stringify(value);
}
if (Array.isArray(value)) {
const parts = value.map(sortedJsonStringify);
return "[" + parts.join(", ") + "]";
}
if (typeof value === "object") {
const keys = Object.keys(value).sort();
const parts = keys.map(
(k) => JSON.stringify(k) + ": " + sortedJsonStringify(value[k]),
);
return "{" + parts.join(", ") + "}";
}
throw new TypeError(`Unsupported value type in canonicalBytes: ${typeof value}`);
}

/**
* Return canonical bytes for a receipt.
*
* @param {object} receipt - parsed scraper-mcp.act_receipt.v1 object
* @returns {Buffer} UTF-8 encoded canonical JSON bytes
*/
export function canonicalBytes(receipt) {
const cleaned = stripForCanonical(receipt);
const json = sortedJsonStringify(cleaned);
return Buffer.from(json, "utf-8");
}

/**
* Return SHA-256 hex of canonicalBytes(receipt) β€” the cache key.
*
* @param {object} receipt
* @returns {string} 64-char lowercase hex
*/
export function canonicalSha256(receipt) {
return createHash("sha256").update(canonicalBytes(receipt)).digest("hex");
}

/**
* Return the fraction of receipts matching the modal canonical hash.
* 1.0 means all receipts produce byte-identical canonical form. null on
* empty input.
*
* @param {object[]} receipts
* @returns {number|null}
*/
export function cacheFriendlyScore(receipts) {
if (!Array.isArray(receipts) || receipts.length === 0) return null;
if (receipts.length === 1) return 1.0;
const hashes = receipts.map(canonicalSha256);
const counts = new Map();
for (const h of hashes) counts.set(h, (counts.get(h) ?? 0) + 1);
const top = Math.max(...counts.values());
return top / hashes.length;
}
Loading