Skip to content

audit-inspection: complete the inspection / debugging utility #8

@cjimti

Description

@cjimti

Tracks the remaining work to land the full inspection / debugging utility on top of the audit_payloads foundation shipped in v1.1.0. Each subtask is independently shippable; the goal is a sequence of focused PRs rather than one large drop.

Status (2026-05-06) — DONE

All nine subtasks merged. Closing this umbrella.

Subtasks

  • Review-fix cleanup (was unmerged on PR feat(audit): capture full request/response payloads + detail fetch (data layer for #4) #5). ✅ Shipped in v1.1.1 (PR feat(audit): notification capture and PR #5 review fixes (#8) #9). All bullets done:

    • errCategory overwrite bug in pkg/mcpmw/audit.go (auth -> tool -> handler precedence)
    • callToolResultToMap drops annotations on text/image/audio (uniform JSON round-trip)
    • response_error.category populated alongside message; ev.ErrorCategory now consistent with payload.response_error.category across auth/tool/handler errors
    • testcontainers Postgres store tests at pkg/audit/postgres/store_test.go
    • reflection-based isEmptyValue for marshalJSONB
    • unmarshalCol with WARN logging on corrupt JSONB
  • Notification recorder. ✅ Shipped in v1.1.1 (PR feat(audit): notification capture and PR #5 review fixes (#8) #9).

  • JSONB path filter compiler. ✅ Merged in PR feat(audit): JSONB path filters and NDJSON export (#8) #10. /audit/events and /audit/export accept ?param.<dotted.path>=v, ?response.<dotted.path>=v, ?header.<name>=v, ?has=<column>. Compiles to EXISTS (SELECT 1 FROM audit_payloads p WHERE p.event_id = audit_events.id AND p.<col> @> $N::jsonb) against the existing jsonb_path_ops GIN indexes.

  • NDJSON export. ✅ Merged in PR feat(audit): JSONB path filters and NDJSON export (#8) #10. GET /api/v1/portal/audit/export?format=jsonl streams the filtered set as newline-delimited summary rows. Hard cap at 100,000 rows; per-row ctx check; Cache-Control: no-store; deferred WriteHeader so a backend error before the first row sends a clean 5xx.

  • Replay endpoint. ✅ Merged in PR feat(audit): replay endpoint and SSE live tail (#8) #11; rate-limit reorder fixed in PR feat(audit): portal inspection UI — drawer, compare page, walkthrough docs #12. POST /api/v1/portal/audit/events/{id}/replay:

    • Re-invokes via in-process MCP client; new audit row tagged source=portal-replay with replayed_from set
    • Per-identity token bucket (5 burst, 1 token / 12s sustained); 429 + Retry-After when exhausted
    • Tokens consumed only after validation passes — clicks on non-replayable rows return 400 without burning the operator's budget
    • Refuses 4xx on: invalid UUID, missing event, no captured payload, redacted parameter values, unregistered tool
    • HTTP 502 on transport-level callErr OR tool-side IsError
    • error_category mirrors mcpmw/audit middleware precedence so /events filtering buckets consistently
    • CSRF-gated via X-Requested-With
  • SSE live tail. ✅ Merged in PR feat(audit): replay endpoint and SSE live tail (#8) #11. GET /api/v1/portal/audit/stream:

    • New audit.SubscribingLogger capability; AsyncLogger broadcasts after inner.Log succeeds; MemoryLogger broadcasts on every Log
    • Per-subscriber mutex serializes send vs cancel (race-tested)
    • Atomic SSE frame write via bytes.Buffer (no half-formed frames on partial encode failure)
    • Opening : connected + : keepalive every 30s; sets X-Accel-Buffering: no
  • Portal UI: click-to-expand drawer with four tabs. ✅ Merged in PR feat(audit): portal inspection UI — drawer, compare page, walkthrough docs #12. Audit page rewritten:

    • Click a row -> side drawer (role=dialog, aria-modal, focus management)
    • Tabs: Overview / Request / Response / Notifications; deep-linkable via ?id=<event-id>
    • Pretty-printed JSON viewer with copy-to-clipboard; redacted header values shown as [redacted] with names visible
    • Replay button with confirmation modal (default focus on Cancel so reflexive Enter dismisses); disabled with tooltip when row is non-replayable
    • Live tail toggle subscribing to the SSE stream (fixed-cap most-recent-first list, cap 20)
    • JSONB filter editor sourcing its has-keys list from /audit/meta (server-driven; UI doesn't duplicate the allow-list)
  • Comparison page. ✅ Merged in PR feat(audit): portal inspection UI — drawer, compare page, walkthrough docs #12. /portal/audit/compare?a=...&b=... renders a side-by-side structural diff. JSON-path-aware: walks objects and arrays by key/index so reordered keys don't show as changes; one-side-undefined trees show per-key only-A / only-B leaves; deep trees indent linearly via per-<ul> padding.

  • Documentation. ✅ Merged in PR feat(audit): portal inspection UI — drawer, compare page, walkthrough docs #12. docs/operations/inspection.md walks through the full workflow end-to-end: capture a call, open the drawer, read each tab, replay it, compare to a baseline, filter via JSONB paths, export. Cross-referenced against the actual replayBurst / replayRefill / maxExportEvents constants. Header redaction policy and "tokens consumed only after validation" called out explicitly.

CI follow-up landed in PR #11

CodeQL's go/clear-text-logging rule fired on audit.Log(*ev) because err.Error() flows into Event.ErrorMessage. The audit logger's contract is to capture this for forensics; CodeQL doesn't know that. PR #11 added .github/codeql/codeql-config.yml excluding the rule with a documented justification; the workflow now references the config-file. New make codeql target lets developers reproduce CI locally; scripts/codeql-gate.sh parses SARIF and applies the same exclusions.

Schema follow-up landed in v1.1.1

0003_audit_payloads_cleanup dropped three unused columns (jsonrpc_id, request_method, request_path) and added notifications_truncated. Migration 0002 was deliberately not edited in place; v1.1.0 operators run 0002 then 0003 and converge to the same final shape as a greenfield install. No further migrations expected.

Security / operator follow-ups landed in PR #12

  • Header redaction at the source. auth.WithHeaders now redacts credential-bearing names (Authorization, Proxy-Authorization, Cookie, Set-Cookie, X-API-Key in any case) before stashing onto ctx, so audit_payloads.request_headers shows [redacted] rather than verbatim. Pre-existing leak (the comment claimed redaction; the implementation didn't); PR feat(audit): portal inspection UI — drawer, compare page, walkthrough docs #12 was the first to put those bytes in front of UI users so the fix landed there.
  • Filter-contract endpoint. GET /api/v1/portal/audit/meta returns {has_keys, json_sources, replay, export} so a UI can build its filter editor against the server's source of truth instead of duplicating allow-lists.
  • Try-It payload capture. Try-It rows used to land with payload=null because recordTryitAudit bypassed the MCP middleware and never built the audit.Payload sibling. The drawer's Response/Notifications tabs correctly reported "No response captured" for those rows. PR feat(audit): portal inspection UI — drawer, compare page, walkthrough docs #12's follow-up commit mirrors recordReplayAudit so Try-It rows now carry the captured request_params, response_result, and response_error.
  • Empty-audit-log crash. Go marshals nil slices as JSON null, and the SPA's recent.map(...) / events.map(...) crashed on a fresh deployment. Audit-store layer now initializes empty results as []T{} so JSON marshals as []; SPA also has ?? [] belt-and-braces.
  • make-dev unblocked. Every dev-* Makefile target that touches docker-compose now declares dev-secrets as a prereq and sources .env.dev inline before invoking compose (compose interpolates ${MCPTEST_COOKIE_SECRET:?required} at parse time on every invocation, and Make subshells lose env state).

Acceptance per PR

Every subtask PR landed:

  1. make verify green at >= 80% filtered coverage.
  2. Focused commit message describing the user-facing change.
  3. Updated the relevant docs page in the same PR.
  4. Checked off the corresponding box.

Notes

  • v1.1.0 is the baseline; nothing here required breaking changes.
  • v1.1.1 added one schema migration (0003_audit_payloads_cleanup); no further migrations.
  • A pre-commit adversarial-review gate (~/.claude/hooks/review-gate.sh) is installed on the maintainer's machine; PRs from this branch landed "review-clean" on the first commit (with one exception, PR feat(audit): replay endpoint and SSE live tail (#8) #11, which exposed a CodeQL-coverage gap that was filled by adding make codeql + scripts/codeql-gate.sh).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions