Local-first reliability triage for AI + API execution.
Given execution receipts, Scan produces a ranked hazard pack showing exactly where reliability is leaking:
- where latency breaches concentrate
- where failures concentrate (429s, timeouts, auth, 5xx)
- which guardrails to implement first
Output: a 1-page reliability snapshot + prioritized guardrail plan.
No dashboards. No calls. Just receipts → hazards → guardrails.
Start here: docs/START_HERE.md
If you’ve ever said:
- "We can’t tell (quickly) which call is killing tail latency.”
- “Retries are spiking / rate limits are killing us.”
- “It usually works… but the tail is brutal.”
Pitstop Scan turns that into a ranked list of failure + breach signatures you can actually fix.
make depsPlace your file at:
input/exhaust.jsonlThen run:
make runOpen the report:
open output/report.md # macOSIf you don’t have receipts yet, you can start from a copied error/log blob:
python -m pitstop_scan.cli intake --in blob.txt --out output/intake-pack --run-scanThis will produce:
- raw_scrubbed.txt — scrubbed copy of the input blob
- artifact.json — normalized boundary classification summary
- exhaust.jsonl — scan-compatible receipt
summary.md— human-readable provisional diagnosis- derived/ — scan outputs
This is intentionally v1:
- heuristic signal extraction
- conservative scrubbing
- simple provisional classification
Then open the report:
open output/intake-pack/derived/report.mdRun a synthetic demo (creates a tiny input/exhaust.jsonl):
make demo
open output/report.mdIf make demo works, you’re ready — replace input/exhaust.jsonl with your real file and rerun make run.
Running the scan writes:
output/report.md— 1-page reliability snapshot + top fixesoutput/hazards.csv— ranked hazards (highest leverage first)output/signatures.csv— per-signature rollupsoutput/summary.json— machine totals (automation-friendly)output/pitstop_pack_agg.zip— zip of the four derived outputs above
Breach = latency exceeds the per-attempt deadline (budget.deadline_ms) even if status is ok.
(If receipts use legacy budget_ms, Scan treats it as budget.deadline_ms.)
If you use intake, Scan writes an intermediate pack before generating derived outputs.
See a sample output pack (derived aggregates only):
- proof/demo_pack_v0/report.md
- proof/demo_pack_v0/hazards.csv
- proof/demo_pack_v0/signatures.csv
- proof/demo_pack_v0/summary.json
Small, runnable “truth artifacts”: invariant → policy → tests → verification.
- receipts/README.md — index of receipts you can paste into a codebase
- 429-floor — Retry-After is a floor (policy + tests)
These receipts are the guardrails the scan will often recommend.
POST a 429 status + headers, get back a classification (WAIT / CAP / STOP) + the first knob to adjust.
curl -s -X POST https://web-production-273d3.up.railway.app/classify \
-H "Content-Type: application/json" \
-d '{"status":429,"headers":{"retry-after":"30"},"provider":"anthropic"}'Three cases:
| Input | Classification | Meaning |
|---|---|---|
retry-after: 30 |
WAIT | Honor header, retry after delay |
retry-after: 600 |
STOP | Quota exhaustion — do not retry |
| no header | CAP | Reduce concurrency — do not retry immediately |
Send a real 429 blob (status + headers) to brentondwilliams@gmail.com and I'll run it through.
Pitstop Scan is a reference implementation of the Pitstop Execution Contract:
execute(intent, budget, policy) -> result, receipt
The contract defines the correctness rules for reliable execution:
- Budget semantics: per-attempt deadlines, max elapsed, retry caps (attempts include fallbacks)
- Classification taxonomy: 429 vs 402 vs timeout vs auth (retryable vs terminal)
- Scope correctness: model vs provider vs credential (cooldown blast radius)
- Audit-grade receipts: emitted on every attempt (including block/cooldown/preemption)
Read the spec:
Each JSONL line should include (aligned to the Execution Contract):
Required:
tool_id,operation,endpoint_normoutcome.statusand (if fail)outcome.error_class(optionallyoutcome.http_status)cost.latency_msbudget.deadline_ms(or legacybudget_ms)execution_id,attempt_id
Recommended (improves ranking + fix guidance):
budget.max_elapsed_msbudget.retry_budgetdecision.action
That’s enough to rank hazards and generate the pack.
The report may include a priced loss model to help rank fixes. Treat it as tunable. The primary truth signals are breach rate and tail latency (p95/p99).
- Local-only by default. No data leaves your machine.
- Input should be operational receipts, not payloads.
Receipts MUST NOT include:
- prompts, message content, tool payload bodies, response bodies
- headers, tokens, API keys, cookies
- raw URLs or query strings (use
endpoint_norm)
Outputs are derived summaries only (no raw requests/responses).
Raw intake blobs may temporarily contain headers or error text; intake output is scrubbed before generating scan-ready receipts.
If you'd like a second set of eyes on your hazard pack, send the derived pack only:
output/pitstop_pack_agg.zip
to brentondwilliams@gmail.com with:
- Stack: (e.g. Python + OpenAI + Redis)
- Workflow: (e.g. agent toolchain, ingestion pipeline)
- Goal: (reduce 429s | reduce p99 | stabilize retries | fix failover)
You’ll receive:
- a prioritized guardrail plan
- concrete configuration / policy changes
- a verification checklist to confirm the fix worked
No calls required. Just artifacts → fix order → verification.
If you can share 50–200 redacted receipts (JSONL metadata only), I can re-run the scan after you ship the guardrails and return a before/after delta.
If you prefer to keep receipts local, the derived pack alone is enough to generate a fix order.