Pitstop Scan

Local-first reliability triage for AI + API execution.

Given execution receipts, Scan produces a ranked hazard pack showing exactly where reliability is leaking:

where latency breaches concentrate
where failures concentrate (429s, timeouts, auth, 5xx)
which guardrails to implement first

Output: a 1-page reliability snapshot + prioritized guardrail plan.

No dashboards. No calls. Just receipts → hazards → guardrails.

Start here: docs/START_HERE.md

Why you’d run this

If you’ve ever said:

"We can’t tell (quickly) which call is killing tail latency.”
“Retries are spiking / rate limits are killing us.”
“It usually works… but the tail is brutal.”

Pitstop Scan turns that into a ranked list of failure + breach signatures you can actually fix.

Quickstart (5 minutes)

1) Install (repo-local venv)

make deps

Option A — You already have receipts (JSONL)

Place your file at:

input/exhaust.jsonl

Then run:

make run

Open the report:

open output/report.md   # macOS

Option B — You only have a raw failure blob

If you don’t have receipts yet, you can start from a copied error/log blob:

python -m pitstop_scan.cli intake --in blob.txt --out output/intake-pack --run-scan

This will produce:

raw_scrubbed.txt — scrubbed copy of the input blob
artifact.json — normalized boundary classification summary
exhaust.jsonl — scan-compatible receipt
summary.md — human-readable provisional diagnosis
derived/ — scan outputs

This is intentionally v1:

heuristic signal extraction
conservative scrubbing
simple provisional classification

Then open the report:

open output/intake-pack/derived/report.md

No data yet? Run a demo

Run a synthetic demo (creates a tiny input/exhaust.jsonl):

make demo
open output/report.md

If make demo works, you’re ready — replace input/exhaust.jsonl with your real file and rerun make run.

What you get

Running the scan writes:

output/report.md — 1-page reliability snapshot + top fixes
output/hazards.csv — ranked hazards (highest leverage first)
output/signatures.csv — per-signature rollups
output/summary.json — machine totals (automation-friendly)
output/pitstop_pack_agg.zip — zip of the four derived outputs above

Breach = latency exceeds the per-attempt deadline (budget.deadline_ms) even if status is ok.
(If receipts use legacy budget_ms, Scan treats it as budget.deadline_ms.)

If you use intake, Scan writes an intermediate pack before generating derived outputs.

Proof (sample hazard pack)

See a sample output pack (derived aggregates only):

Receipts (drop-in invariants)

Small, runnable “truth artifacts”: invariant → policy → tests → verification.

receipts/README.md — index of receipts you can paste into a codebase
429-floor — Retry-After is a floor (policy + tests)

These receipts are the guardrails the scan will often recommend.

Live 429 Classifier

POST a 429 status + headers, get back a classification (WAIT / CAP / STOP) + the first knob to adjust.

curl -s -X POST https://web-production-273d3.up.railway.app/classify \
  -H "Content-Type: application/json" \
  -d '{"status":429,"headers":{"retry-after":"30"},"provider":"anthropic"}'

Three cases:

Input	Classification	Meaning
`retry-after: 30`	WAIT	Honor header, retry after delay
`retry-after: 600`	STOP	Quota exhaustion — do not retry
no header	CAP	Reduce concurrency — do not retry immediately

Send a real 429 blob (status + headers) to brentondwilliams@gmail.com and I'll run it through.

The Execution Contract (v1.0)

Pitstop Scan is a reference implementation of the Pitstop Execution Contract:

execute(intent, budget, policy) -> result, receipt

The contract defines the correctness rules for reliable execution:

Budget semantics: per-attempt deadlines, max elapsed, retry caps (attempts include fallbacks)
Classification taxonomy: 429 vs 402 vs timeout vs auth (retryable vs terminal)
Scope correctness: model vs provider vs credential (cooldown blast radius)
Audit-grade receipts: emitted on every attempt (including block/cooldown/preemption)

Read the spec:

→ EXECUTION_CONTRACT.md

Input contract (minimum viable)

Each JSONL line should include (aligned to the Execution Contract):

Required:

tool_id, operation, endpoint_norm
outcome.status and (if fail) outcome.error_class (optionally outcome.http_status)
cost.latency_ms
budget.deadline_ms (or legacy budget_ms)
execution_id, attempt_id

Recommended (improves ranking + fix guidance):

budget.max_elapsed_ms
budget.retry_budget
decision.action

That’s enough to rank hazards and generate the pack.

Notes on “loss” and cost framing

The report may include a priced loss model to help rank fixes. Treat it as tunable. The primary truth signals are breach rate and tail latency (p95/p99).

Privacy / safety (hard boundary)

Local-only by default. No data leaves your machine.
Input should be operational receipts, not payloads.

Receipts MUST NOT include:

prompts, message content, tool payload bodies, response bodies
headers, tokens, API keys, cookies
raw URLs or query strings (use endpoint_norm)

Outputs are derived summaries only (no raw requests/responses).

Raw intake blobs may temporarily contain headers or error text; intake output is scrubbed before generating scan-ready receipts.

Want help applying the fix order?

If you'd like a second set of eyes on your hazard pack, send the derived pack only:

output/pitstop_pack_agg.zip

to brentondwilliams@gmail.com with:

Stack: (e.g. Python + OpenAI + Redis)
Workflow: (e.g. agent toolchain, ingestion pipeline)
Goal: (reduce 429s | reduce p99 | stabilize retries | fix failover)

You’ll receive:

a prioritized guardrail plan
concrete configuration / policy changes
a verification checklist to confirm the fix worked

No calls required. Just artifacts → fix order → verification.

Before / after delta (optional)

If you can share 50–200 redacted receipts (JSONL metadata only), I can re-run the scan after you ship the guardrails and return a before/after delta.

If you prefer to keep receipts local, the derived pack alone is enough to generate a fix order.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
docs		docs
input		input
output		output
pitstop_retkit		pitstop_retkit
pitstop_scan		pitstop_scan
proof		proof
proto/classify-error		proto/classify-error
receipts/429-floor		receipts/429-floor
schemas		schemas
scripts		scripts
.gitignore		.gitignore
EXECUTION_CONTRACT.md		EXECUTION_CONTRACT.md
LICENSE		LICENSE
Makefile		Makefile
Procfile		Procfile
README.md		README.md
api.py		api.py
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pitstop Scan

Why you’d run this

Quickstart (5 minutes)

1) Install (repo-local venv)

Option A — You already have receipts (JSONL)

Option B — You only have a raw failure blob

No data yet? Run a demo

What you get

Proof (sample hazard pack)

Receipts (drop-in invariants)

Live 429 Classifier

The Execution Contract (v1.0)

Input contract (minimum viable)

Notes on “loss” and cost framing

Privacy / safety (hard boundary)

Want help applying the fix order?

Before / after delta (optional)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pitstop Scan

Why you’d run this

Quickstart (5 minutes)

1) Install (repo-local venv)

Option A — You already have receipts (JSONL)

Option B — You only have a raw failure blob

No data yet? Run a demo

What you get

Proof (sample hazard pack)

Receipts (drop-in invariants)

Live 429 Classifier

The Execution Contract (v1.0)

Input contract (minimum viable)

Notes on “loss” and cost framing

Privacy / safety (hard boundary)

Want help applying the fix order?

Before / after delta (optional)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages