Skip to content

feat(audit): notification capture and PR #5 review fixes (#8)#9

Merged
cjimti merged 4 commits into
mainfrom
feat/audit-inspection-completion
May 6, 2026
Merged

feat(audit): notification capture and PR #5 review fixes (#8)#9
cjimti merged 4 commits into
mainfrom
feat/audit-inspection-completion

Conversation

@cjimti
Copy link
Copy Markdown
Contributor

@cjimti cjimti commented May 5, 2026

Second slice of #8. Closes off two of the nine subtasks: the review-fix cleanup that was unmerged on PR #5, and the notification recorder that PR #5 stubbed in the schema but did not implement. Replay endpoint, SSE tail, JSONB filter compiler, NDJSON export, UI drawer, comparison page, and docs remain as separate follow-ups on top of this.

What's in this PR

Notification recorder (pkg/mcpmw/notifications.go)

  • New notificationRecorder: per-request, ctx-keyed, mutex-guarded, capped via WithMaxNotifications(n). Concurrent-safe so tools that fan out goroutines can each call NotifyProgress without external synchronization. Drops past the cap rather than growing unboundedly.
  • New mcpmw.Notifications() sending-side middleware. On every dispatched method it checks isNotification(method) and, if a recorder is on ctx, appends {timestamp, method, params}. No-op when no recorder is seeded (e.g. notifications dispatched outside a tool-call window).
  • paramsToMap renders SDK-typed Params via JSON round-trip so the recorder does not have to know each notification shape; notifications/progress, notifications/logging/message, and anything else flow through unchanged.

Wiring (pkg/mcpmw/audit.go, internal/server/server.go)

  • The receiving Audit middleware now seeds a notificationRecorder onto ctx whenever capture_payloads is on, then passes its Snapshot() into buildPayload. The captured slice lands in audit_payloads.notifications.
  • internal/server/server.go wires mcpServer.AddSendingMiddleware(mcpmw.Notifications()) next to the existing AddReceivingMiddleware(mcpmw.Audit(...)).
  • Drive-by fix in the same file: errCategory = "handler" no longer overwrites a category (tool, auth_failed) set upstream. Was a regression noted in the PR feat(audit): capture full request/response payloads + detail fetch (data layer for #4) #5 review; covered by TestAudit_AuthFailureCarriesCategory.

PR #5 review-fix cleanup

Re-applies the fixes that lived on the local-only feat/audit-inspection-issue-4 @ b192c8e and never made main. Tracked as the first checkbox in #8.

  • pkg/mcpmw/audit.go / audit_payload_test.go: errCategory precedence (auth_failed > tool > handler); callToolResultToMap uniform JSON round-trip so annotations on text/image/audio content blocks survive instead of being silently dropped.
  • pkg/audit/event.go, pkg/httpsrv/portal_api.go: response_error.category populated alongside message so the portal can show "auth rejected" vs "tool errored" without parsing strings.
  • pkg/audit/postgres/store.go: reflection-based isEmptyValue for marshalJSONB (was previously hand-rolled per type and missed slices); unmarshalCol with WARN logging on corrupt JSONB so a single bad row does not 500 the detail endpoint.
  • pkg/audit/postgres/store_test.go: new testcontainers-backed Postgres store tests covering the JSONB round-trip, transactional Log, payload retrieval, and corrupt-row handling.
  • pkg/audit/postgres/migrate/migrations/0002_audit_payloads.up.sql: schema follow-ups from the same review.
  • internal/server/audit_options_test.go: config-side coverage for auditOptions(cfg.Audit) translation.
  • configs/mcp-test.{example,dev,live}.yaml: documents max_notifications alongside the existing capture knobs.

Tests

File Coverage
pkg/mcpmw/notifications_test.go recorder append/snapshot/cap, nil-receiver safety, 100-goroutine concurrent append; middleware no-op without ctx, records when present, ignores non-notifications/* methods; isNotification table tests; paramsToMap nil and unmarshalable-input paths
pkg/mcpmw/audit_payload_test.go new TestAudit_PayloadCapturesNotifications end-to-end: receiving Audit seeds the recorder, the fake handler fires two progress notifications via the sending middleware against the same ctx, both land in Payload.Notifications with the right method and params
pkg/audit/postgres/store_test.go testcontainers Postgres: Log round-trip, payload join, JSONB empties, unmarshalCol warn path
internal/server/audit_options_test.go auditOptions produces the expected []mcpmw.AuditOption slice for each config shape

Closes (subtasks of #8)

Does not close the umbrella issue. Remaining: replay endpoint, SSE live tail, JSONB path filter compiler, NDJSON export, portal UI drawer, comparison page, and docs/operations/inspection.md.

Test plan

  • go test ./... green locally
  • make verify green at >=80% filtered coverage
  • Bring up make dev, call a tool that fires NotifyProgress (e.g. mcptest__progress), SELECT notifications FROM audit_payloads WHERE event_id = '<id>' returns a JSONB array with two elements, each {ts, method: "notifications/progress", params: {progress, total, message}}
  • Override audit.max_notifications: 3 in config, fire a tool that emits 10 notifications, confirm the stored array has exactly 3 entries
  • Set audit.capture_payloads: false, fire the same tool, confirm no audit_payloads row is written and the receiver does not seed a recorder (no extra ctx allocation per call)
  • Trigger an auth-rejected call, confirm response_error.category = "auth_failed" (not "handler") in the resulting event

cjimti added 4 commits May 2, 2026 16:53
Acts on the 12 findings from the critical review of #5. Behavior is
unchanged for callers that don't opt into payload capture; the wire
shape and config surface are tightened to match what the middleware
can actually populate today.

M-1: tighten the captured fields to what the middleware can populate
- Drop JSONRPCID, RequestMethod, RequestPath from the Payload struct
  and the audit_payloads schema. These were declared but never
  populated; the audit middleware sees the JSON-RPC layer (where the
  ID isn't exposed by the SDK) and not the HTTP layer.
- Plumb the dispatched JSON-RPC method through buildPayload instead
  of hard-coding "tools/call" so future non-tool method audits get
  the right value.

M-2: preserve content for non-text/image/audio blocks
- callToolResultToMap falls through to json.Marshal(c) for any
  Content type the switch doesn't enumerate (EmbeddedResource and
  any future kinds), so the actual payload data lands in the
  audit_payloads row instead of a "non-textual content block"
  placeholder. Test covers the EmbeddedResource case end-to-end.

M-3: structured error category on auth failures
- ResponseError now carries a category field ("auth" / "tool" /
  "handler") alongside message so the portal can filter without
  string-matching. buildPayload accepts errCategory and threads it
  through.

M-4: surface the new audit knobs in shipped configs
- mcp-test.example.yaml, mcp-test.dev.yaml, mcp-test.live.yaml all
  show capture_payloads, capture_headers, max_payload_bytes,
  max_notifications with explanatory comments. Discoverable from
  config without reading source.

M-5: WithPayload doc comment corrected
- Old comment claimed "nil-or-zero Payload values are not persisted";
  that's not how it works. Any non-nil pointer is persisted; pass
  nil to opt out.

M-6: testcontainers-backed Postgres store tests
- pkg/audit/postgres/store_test.go (//go:build integration) covers:
  full JSONB round-trip through real columns (params, headers,
  result, notifications, all sub-shapes), FK cascade (deleting
  audit_events drops audit_payloads), summary-only path when
  Payload is nil. Skip-on-no-Docker pattern matches existing
  tests/integration_test.go.

L-1: auditEventDetail doc comment corrected
- Old comment said "O(scan) today"; Postgres uses the PK index for
  WHERE id = $1 LIMIT 1. Distinguish the abstract Query call from
  the actual SQL.

L-2: marshalJSONB uses reflection, not a type switch
- isEmptyValue uses reflect.Kind to detect zero-length containers
  uniformly (map, slice, array, chan, string) plus nil
  pointer/interface. Won't silently break if Payload grows a new
  field type.

L-3: unmarshal failures logged at WARN
- New unmarshalCol helper wraps json.Unmarshal and emits a slog.Warn
  with event_id + column name on failure (corrupt JSONB written by
  a buggy older binary). Field stays empty; rest of the payload
  still renders. auditEventDetail also logs WARN when the underlying
  GetPayload fails so operators can chase the cause without the
  request itself failing.

L-4: auditOptions config translation tested
- internal/server/audit_options_test.go covers the 5-case matrix:
  defaults-on, capture disabled, headers disabled, custom
  max_notifications, zero notifications.

L-5: extra middleware tests for the M-* fixes
- TestAudit_PayloadCapture_PreservesNonTextContent covers the
  EmbeddedResource fall-through path.
- TestAudit_AuthFailureCarriesCategory covers the structured
  category in response_error.
- TestAudit_PayloadCaptures_DispatchedMethod covers the JSONRPCMethod
  threading.

L-6: captured_at documented in the migration
- Comment explains why captured_at is distinct from
  audit_events.ts: forensic queries when the async drain is
  delayed.

CI ticket for the broader project tooling gaps (CodeQL, Scorecard,
Semgrep, frontend typecheck, gosec gating, action SHA pinning,
Codecov upload, integration test visibility) filed at #6.

make verify is green at >=80% filtered coverage.
Adds a sending-side middleware (mcpmw.Notifications) that records every
notifications/* method dispatched during a tool-call window into a
per-request recorder seeded on ctx by the receiving Audit middleware.
The captured slice is written to audit_payloads.notifications, giving
the inspection UI visibility into progress/log notifications fired by
tools.

Also: in Audit, only set errCategory = "handler" when no more specific
category was assigned upstream, so tool/auth categories aren't
overwritten on the way out.
Notification capture had two real defects: tool-emitted secrets bypassed
the redact list, and a single tool call could push unbounded data into
audit_payloads.notifications via LogMessage's `data any` blob. The event
and payload also disagreed on error_category for handler errors.

- Pipe redactKeys through to notificationRecorder so notification params
  get the same audit.SanitizeParameters treatment as tool params.
- Add NotificationsTruncated + sql column. fitNotifications binary-searches
  for the longest prefix of the slice that fits MaxPayloadBytes; trim and
  flag if anything was dropped.
- Set ev.ErrorCategory = errCategory once after categorization so the
  indexed event field always agrees with payload.response_error.category.
- Drop the redundant slice cap in buildPayload (recorder already enforces
  count cap at Append time).
- strings.HasPrefix instead of hand-rolled prefix check.
- Distinct sentinels for content-block round-trip failures: _marshal_error,
  _unmarshal_error, _no_type. Snake_case the diagnostic keys.
- Doc the post-return notification drop as deliberate.
- New integration test in tests/audit_notifications_test.go boots the full
  HTTP stack, calls a tool that fires NotifyProgress, asserts the audit
  payload row carries the notifications. Locks the SDK contract that
  AddSendingMiddleware fires on session-initiated notifications.
CI failed gosec G706 (log injection via taint analysis) on the slog.Warn
in auditEventDetail; the path's id was tainted from r.PathValue. Local
make verify missed it because the Makefile only checked tool presence
via `which`, not pinned versions, so a brew-installed newer gosec passed
while the v2.25.0 in CI rejected.

Fixes:
- Validate event id as a UUID at the boundary; log only the canonical
  uuid.UUID.String() form. Inline #nosec G706 covers the residual
  conservative taint warning gosec carries through uuid.Parse.
- Migration 0002 already shipped in v1.1.0; restoring it to its tag
  state and adding 0003_audit_payloads_cleanup that drops the unused
  jsonrpc_id / request_method / request_path columns and adds
  notifications_truncated. Apply order is: 0002 (v1.1.0 schema) then
  0003 (cleanup). Operators on v1.1.0 land cleanly; greenfield deploys
  run both end up with the same final shape.
- Move JSON marshal + SanitizeParameters out of the recorder mutex so
  concurrent NotifyProgress calls don't serialize on each other.
  Quick locked cap-check up front skips the work when full; the
  re-check after re-locking guards the race.
- SanitizeParameters short-circuits when redactKeys is empty: returns
  the input map directly instead of paying for a deep copy.
- fitNotifications: single linear marshal-and-accumulate pass instead
  of the binary-search-with-repeated-marshals approach. Handles a
  per-entry marshal error by truncating at that index.
- New tests: TestNotificationRecorder_PostSnapshotAppendsAreDropped
  locks the documented snapshot-then-late-Append behavior; the
  postgres testcontainer test now sets and asserts NotificationsTruncated;
  TestPortalAPI_AuditEventDetail_RejectsNonUUID covers the new 400 path.
- Existing portal_api_detail_test.go event IDs switched to literal UUIDs
  to match the boundary validation.

Makefile changes so `make verify` actually catches what CI catches:
- New tools-install target runs `go install` of pinned golangci-lint
  and gosec into bin/tools/, with a version-stamped sentinel so it
  reinstalls when the pinned vars change.
- lint / gosec / govulncheck targets now invoke $(TOOLS_DIR)/<binary>
  rather than whatever `which` finds first.
- tools-check shows the resolved versions instead of just confirming
  presence.
@cjimti cjimti merged commit b820563 into main May 6, 2026
8 checks passed
@cjimti cjimti deleted the feat/audit-inspection-completion branch May 6, 2026 02:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant