audit-inspection: complete the inspection / debugging utility

Tracks the remaining work to land the full inspection / debugging utility on top of the audit_payloads foundation shipped in v1.1.0. Each subtask is independently shippable; the goal is a sequence of focused PRs rather than one large drop.

## Status (2026-05-06) — DONE

All nine subtasks merged. Closing this umbrella.

- PR #9 shipped as v1.1.1 (review-fix cleanup, notification recorder).
- PR #10 merged (JSONB filter compiler, NDJSON export).
- PR #11 merged (replay endpoint, SSE live tail).
- PR #12 merged (portal UI drawer, comparison page, inspection.md walkthrough; plus follow-up commits for header redaction, replay-rate-limit reorder, /audit/meta endpoint, Try-It payload capture, empty-log nil-slice fix, dev-stack secrets sourcing).

## Subtasks

- [x] **Review-fix cleanup (was unmerged on PR #5).** ✅ Shipped in v1.1.1 (PR #9). All bullets done:
  - errCategory overwrite bug in pkg/mcpmw/audit.go (auth -> tool -> handler precedence)
  - callToolResultToMap drops annotations on text/image/audio (uniform JSON round-trip)
  - response_error.category populated alongside message; ev.ErrorCategory now consistent with payload.response_error.category across auth/tool/handler errors
  - testcontainers Postgres store tests at pkg/audit/postgres/store_test.go
  - reflection-based isEmptyValue for marshalJSONB
  - unmarshalCol with WARN logging on corrupt JSONB

- [x] **Notification recorder.** ✅ Shipped in v1.1.1 (PR #9).

- [x] **JSONB path filter compiler.** ✅ Merged in PR #10. `/audit/events` and `/audit/export` accept `?param.<dotted.path>=v`, `?response.<dotted.path>=v`, `?header.<name>=v`, `?has=<column>`. Compiles to `EXISTS (SELECT 1 FROM audit_payloads p WHERE p.event_id = audit_events.id AND p.<col> @> $N::jsonb)` against the existing `jsonb_path_ops` GIN indexes.

- [x] **NDJSON export.** ✅ Merged in PR #10. `GET /api/v1/portal/audit/export?format=jsonl` streams the filtered set as newline-delimited summary rows. Hard cap at 100,000 rows; per-row ctx check; `Cache-Control: no-store`; deferred WriteHeader so a backend error before the first row sends a clean 5xx.

- [x] **Replay endpoint.** ✅ Merged in PR #11; rate-limit reorder fixed in PR #12. `POST /api/v1/portal/audit/events/{id}/replay`:
  - Re-invokes via in-process MCP client; new audit row tagged `source=portal-replay` with `replayed_from` set
  - Per-identity token bucket (5 burst, 1 token / 12s sustained); 429 + Retry-After when exhausted
  - Tokens consumed only after validation passes — clicks on non-replayable rows return 400 without burning the operator's budget
  - Refuses 4xx on: invalid UUID, missing event, no captured payload, redacted parameter values, unregistered tool
  - HTTP 502 on transport-level callErr OR tool-side IsError
  - error_category mirrors mcpmw/audit middleware precedence so /events filtering buckets consistently
  - CSRF-gated via X-Requested-With

- [x] **SSE live tail.** ✅ Merged in PR #11. `GET /api/v1/portal/audit/stream`:
  - New `audit.SubscribingLogger` capability; AsyncLogger broadcasts after inner.Log succeeds; MemoryLogger broadcasts on every Log
  - Per-subscriber mutex serializes send vs cancel (race-tested)
  - Atomic SSE frame write via bytes.Buffer (no half-formed frames on partial encode failure)
  - Opening `: connected` + `: keepalive` every 30s; sets `X-Accel-Buffering: no`

- [x] **Portal UI: click-to-expand drawer with four tabs.** ✅ Merged in PR #12. Audit page rewritten:
  - Click a row -> side drawer (role=dialog, aria-modal, focus management)
  - Tabs: Overview / Request / Response / Notifications; deep-linkable via `?id=<event-id>`
  - Pretty-printed JSON viewer with copy-to-clipboard; redacted header values shown as `[redacted]` with names visible
  - Replay button with confirmation modal (default focus on Cancel so reflexive Enter dismisses); disabled with tooltip when row is non-replayable
  - Live tail toggle subscribing to the SSE stream (fixed-cap most-recent-first list, cap 20)
  - JSONB filter editor sourcing its has-keys list from `/audit/meta` (server-driven; UI doesn't duplicate the allow-list)

- [x] **Comparison page.** ✅ Merged in PR #12. `/portal/audit/compare?a=...&b=...` renders a side-by-side structural diff. JSON-path-aware: walks objects and arrays by key/index so reordered keys don't show as changes; one-side-undefined trees show per-key only-A / only-B leaves; deep trees indent linearly via per-`<ul>` padding.

- [x] **Documentation.** ✅ Merged in PR #12. `docs/operations/inspection.md` walks through the full workflow end-to-end: capture a call, open the drawer, read each tab, replay it, compare to a baseline, filter via JSONB paths, export. Cross-referenced against the actual `replayBurst` / `replayRefill` / `maxExportEvents` constants. Header redaction policy and "tokens consumed only after validation" called out explicitly.

## CI follow-up landed in PR #11

CodeQL's `go/clear-text-logging` rule fired on `audit.Log(*ev)` because `err.Error()` flows into `Event.ErrorMessage`. The audit logger's contract is to capture this for forensics; CodeQL doesn't know that. PR #11 added `.github/codeql/codeql-config.yml` excluding the rule with a documented justification; the workflow now references the config-file. New `make codeql` target lets developers reproduce CI locally; `scripts/codeql-gate.sh` parses SARIF and applies the same exclusions.

## Schema follow-up landed in v1.1.1

`0003_audit_payloads_cleanup` dropped three unused columns (`jsonrpc_id`, `request_method`, `request_path`) and added `notifications_truncated`. Migration 0002 was deliberately not edited in place; v1.1.0 operators run 0002 then 0003 and converge to the same final shape as a greenfield install. No further migrations expected.

## Security / operator follow-ups landed in PR #12

- **Header redaction at the source.** `auth.WithHeaders` now redacts credential-bearing names (Authorization, Proxy-Authorization, Cookie, Set-Cookie, X-API-Key in any case) before stashing onto ctx, so audit_payloads.request_headers shows `[redacted]` rather than verbatim. Pre-existing leak (the comment claimed redaction; the implementation didn't); PR #12 was the first to put those bytes in front of UI users so the fix landed there.
- **Filter-contract endpoint.** `GET /api/v1/portal/audit/meta` returns `{has_keys, json_sources, replay, export}` so a UI can build its filter editor against the server's source of truth instead of duplicating allow-lists.
- **Try-It payload capture.** Try-It rows used to land with `payload=null` because `recordTryitAudit` bypassed the MCP middleware and never built the `audit.Payload` sibling. The drawer's Response/Notifications tabs correctly reported "No response captured" for those rows. PR #12's follow-up commit mirrors `recordReplayAudit` so Try-It rows now carry the captured request_params, response_result, and response_error.
- **Empty-audit-log crash.** Go marshals nil slices as JSON null, and the SPA's `recent.map(...)` / `events.map(...)` crashed on a fresh deployment. Audit-store layer now initializes empty results as `[]T{}` so JSON marshals as `[]`; SPA also has `?? []` belt-and-braces.
- **make-dev unblocked.** Every dev-* Makefile target that touches docker-compose now declares `dev-secrets` as a prereq and sources `.env.dev` inline before invoking compose (compose interpolates `${MCPTEST_COOKIE_SECRET:?required}` at parse time on every invocation, and Make subshells lose env state).

## Acceptance per PR

Every subtask PR landed:

1. make verify green at >= 80% filtered coverage.
2. Focused commit message describing the user-facing change.
3. Updated the relevant docs page in the same PR.
4. Checked off the corresponding box.

## Notes

- v1.1.0 is the baseline; nothing here required breaking changes.
- v1.1.1 added one schema migration (`0003_audit_payloads_cleanup`); no further migrations.
- A pre-commit adversarial-review gate (`~/.claude/hooks/review-gate.sh`) is installed on the maintainer's machine; PRs from this branch landed "review-clean" on the first commit (with one exception, PR #11, which exposed a CodeQL-coverage gap that was filled by adding `make codeql` + `scripts/codeql-gate.sh`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audit-inspection: complete the inspection / debugging utility #8

Status (2026-05-06) — DONE

Subtasks

CI follow-up landed in PR #11

Schema follow-up landed in v1.1.1

Security / operator follow-ups landed in PR #12

Acceptance per PR

Notes

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

audit-inspection: complete the inspection / debugging utility #8

Description

Status (2026-05-06) — DONE

Subtasks

CI follow-up landed in PR #11

Schema follow-up landed in v1.1.1

Security / operator follow-ups landed in PR #12

Acceptance per PR

Notes

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions