feat(audit): notification capture and PR #5 review fixes (#8)#9
Merged
Conversation
Acts on the 12 findings from the critical review of #5. Behavior is unchanged for callers that don't opt into payload capture; the wire shape and config surface are tightened to match what the middleware can actually populate today. M-1: tighten the captured fields to what the middleware can populate - Drop JSONRPCID, RequestMethod, RequestPath from the Payload struct and the audit_payloads schema. These were declared but never populated; the audit middleware sees the JSON-RPC layer (where the ID isn't exposed by the SDK) and not the HTTP layer. - Plumb the dispatched JSON-RPC method through buildPayload instead of hard-coding "tools/call" so future non-tool method audits get the right value. M-2: preserve content for non-text/image/audio blocks - callToolResultToMap falls through to json.Marshal(c) for any Content type the switch doesn't enumerate (EmbeddedResource and any future kinds), so the actual payload data lands in the audit_payloads row instead of a "non-textual content block" placeholder. Test covers the EmbeddedResource case end-to-end. M-3: structured error category on auth failures - ResponseError now carries a category field ("auth" / "tool" / "handler") alongside message so the portal can filter without string-matching. buildPayload accepts errCategory and threads it through. M-4: surface the new audit knobs in shipped configs - mcp-test.example.yaml, mcp-test.dev.yaml, mcp-test.live.yaml all show capture_payloads, capture_headers, max_payload_bytes, max_notifications with explanatory comments. Discoverable from config without reading source. M-5: WithPayload doc comment corrected - Old comment claimed "nil-or-zero Payload values are not persisted"; that's not how it works. Any non-nil pointer is persisted; pass nil to opt out. M-6: testcontainers-backed Postgres store tests - pkg/audit/postgres/store_test.go (//go:build integration) covers: full JSONB round-trip through real columns (params, headers, result, notifications, all sub-shapes), FK cascade (deleting audit_events drops audit_payloads), summary-only path when Payload is nil. Skip-on-no-Docker pattern matches existing tests/integration_test.go. L-1: auditEventDetail doc comment corrected - Old comment said "O(scan) today"; Postgres uses the PK index for WHERE id = $1 LIMIT 1. Distinguish the abstract Query call from the actual SQL. L-2: marshalJSONB uses reflection, not a type switch - isEmptyValue uses reflect.Kind to detect zero-length containers uniformly (map, slice, array, chan, string) plus nil pointer/interface. Won't silently break if Payload grows a new field type. L-3: unmarshal failures logged at WARN - New unmarshalCol helper wraps json.Unmarshal and emits a slog.Warn with event_id + column name on failure (corrupt JSONB written by a buggy older binary). Field stays empty; rest of the payload still renders. auditEventDetail also logs WARN when the underlying GetPayload fails so operators can chase the cause without the request itself failing. L-4: auditOptions config translation tested - internal/server/audit_options_test.go covers the 5-case matrix: defaults-on, capture disabled, headers disabled, custom max_notifications, zero notifications. L-5: extra middleware tests for the M-* fixes - TestAudit_PayloadCapture_PreservesNonTextContent covers the EmbeddedResource fall-through path. - TestAudit_AuthFailureCarriesCategory covers the structured category in response_error. - TestAudit_PayloadCaptures_DispatchedMethod covers the JSONRPCMethod threading. L-6: captured_at documented in the migration - Comment explains why captured_at is distinct from audit_events.ts: forensic queries when the async drain is delayed. CI ticket for the broader project tooling gaps (CodeQL, Scorecard, Semgrep, frontend typecheck, gosec gating, action SHA pinning, Codecov upload, integration test visibility) filed at #6. make verify is green at >=80% filtered coverage.
Adds a sending-side middleware (mcpmw.Notifications) that records every notifications/* method dispatched during a tool-call window into a per-request recorder seeded on ctx by the receiving Audit middleware. The captured slice is written to audit_payloads.notifications, giving the inspection UI visibility into progress/log notifications fired by tools. Also: in Audit, only set errCategory = "handler" when no more specific category was assigned upstream, so tool/auth categories aren't overwritten on the way out.
Notification capture had two real defects: tool-emitted secrets bypassed the redact list, and a single tool call could push unbounded data into audit_payloads.notifications via LogMessage's `data any` blob. The event and payload also disagreed on error_category for handler errors. - Pipe redactKeys through to notificationRecorder so notification params get the same audit.SanitizeParameters treatment as tool params. - Add NotificationsTruncated + sql column. fitNotifications binary-searches for the longest prefix of the slice that fits MaxPayloadBytes; trim and flag if anything was dropped. - Set ev.ErrorCategory = errCategory once after categorization so the indexed event field always agrees with payload.response_error.category. - Drop the redundant slice cap in buildPayload (recorder already enforces count cap at Append time). - strings.HasPrefix instead of hand-rolled prefix check. - Distinct sentinels for content-block round-trip failures: _marshal_error, _unmarshal_error, _no_type. Snake_case the diagnostic keys. - Doc the post-return notification drop as deliberate. - New integration test in tests/audit_notifications_test.go boots the full HTTP stack, calls a tool that fires NotifyProgress, asserts the audit payload row carries the notifications. Locks the SDK contract that AddSendingMiddleware fires on session-initiated notifications.
CI failed gosec G706 (log injection via taint analysis) on the slog.Warn in auditEventDetail; the path's id was tainted from r.PathValue. Local make verify missed it because the Makefile only checked tool presence via `which`, not pinned versions, so a brew-installed newer gosec passed while the v2.25.0 in CI rejected. Fixes: - Validate event id as a UUID at the boundary; log only the canonical uuid.UUID.String() form. Inline #nosec G706 covers the residual conservative taint warning gosec carries through uuid.Parse. - Migration 0002 already shipped in v1.1.0; restoring it to its tag state and adding 0003_audit_payloads_cleanup that drops the unused jsonrpc_id / request_method / request_path columns and adds notifications_truncated. Apply order is: 0002 (v1.1.0 schema) then 0003 (cleanup). Operators on v1.1.0 land cleanly; greenfield deploys run both end up with the same final shape. - Move JSON marshal + SanitizeParameters out of the recorder mutex so concurrent NotifyProgress calls don't serialize on each other. Quick locked cap-check up front skips the work when full; the re-check after re-locking guards the race. - SanitizeParameters short-circuits when redactKeys is empty: returns the input map directly instead of paying for a deep copy. - fitNotifications: single linear marshal-and-accumulate pass instead of the binary-search-with-repeated-marshals approach. Handles a per-entry marshal error by truncating at that index. - New tests: TestNotificationRecorder_PostSnapshotAppendsAreDropped locks the documented snapshot-then-late-Append behavior; the postgres testcontainer test now sets and asserts NotificationsTruncated; TestPortalAPI_AuditEventDetail_RejectsNonUUID covers the new 400 path. - Existing portal_api_detail_test.go event IDs switched to literal UUIDs to match the boundary validation. Makefile changes so `make verify` actually catches what CI catches: - New tools-install target runs `go install` of pinned golangci-lint and gosec into bin/tools/, with a version-stamped sentinel so it reinstalls when the pinned vars change. - lint / gosec / govulncheck targets now invoke $(TOOLS_DIR)/<binary> rather than whatever `which` finds first. - tools-check shows the resolved versions instead of just confirming presence.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second slice of #8. Closes off two of the nine subtasks: the review-fix cleanup that was unmerged on PR #5, and the notification recorder that PR #5 stubbed in the schema but did not implement. Replay endpoint, SSE tail, JSONB filter compiler, NDJSON export, UI drawer, comparison page, and docs remain as separate follow-ups on top of this.
What's in this PR
Notification recorder (
pkg/mcpmw/notifications.go)notificationRecorder: per-request, ctx-keyed, mutex-guarded, capped viaWithMaxNotifications(n). Concurrent-safe so tools that fan out goroutines can each callNotifyProgresswithout external synchronization. Drops past the cap rather than growing unboundedly.mcpmw.Notifications()sending-side middleware. On every dispatched method it checksisNotification(method)and, if a recorder is on ctx, appends{timestamp, method, params}. No-op when no recorder is seeded (e.g. notifications dispatched outside a tool-call window).paramsToMaprenders SDK-typedParamsvia JSON round-trip so the recorder does not have to know each notification shape;notifications/progress,notifications/logging/message, and anything else flow through unchanged.Wiring (
pkg/mcpmw/audit.go,internal/server/server.go)Auditmiddleware now seeds anotificationRecorderonto ctx whenevercapture_payloadsis on, then passes itsSnapshot()intobuildPayload. The captured slice lands inaudit_payloads.notifications.internal/server/server.gowiresmcpServer.AddSendingMiddleware(mcpmw.Notifications())next to the existingAddReceivingMiddleware(mcpmw.Audit(...)).errCategory = "handler"no longer overwrites a category (tool,auth_failed) set upstream. Was a regression noted in the PR feat(audit): capture full request/response payloads + detail fetch (data layer for #4) #5 review; covered byTestAudit_AuthFailureCarriesCategory.PR #5 review-fix cleanup
Re-applies the fixes that lived on the local-only
feat/audit-inspection-issue-4 @ b192c8eand never mademain. Tracked as the first checkbox in #8.pkg/mcpmw/audit.go/audit_payload_test.go:errCategoryprecedence (auth_failed>tool>handler);callToolResultToMapuniform JSON round-trip so annotations on text/image/audio content blocks survive instead of being silently dropped.pkg/audit/event.go,pkg/httpsrv/portal_api.go:response_error.categorypopulated alongsidemessageso the portal can show "auth rejected" vs "tool errored" without parsing strings.pkg/audit/postgres/store.go: reflection-basedisEmptyValueformarshalJSONB(was previously hand-rolled per type and missed slices);unmarshalColwith WARN logging on corrupt JSONB so a single bad row does not 500 the detail endpoint.pkg/audit/postgres/store_test.go: new testcontainers-backed Postgres store tests covering the JSONB round-trip, transactionalLog, payload retrieval, and corrupt-row handling.pkg/audit/postgres/migrate/migrations/0002_audit_payloads.up.sql: schema follow-ups from the same review.internal/server/audit_options_test.go: config-side coverage forauditOptions(cfg.Audit)translation.configs/mcp-test.{example,dev,live}.yaml: documentsmax_notificationsalongside the existing capture knobs.Tests
pkg/mcpmw/notifications_test.gonotifications/*methods;isNotificationtable tests;paramsToMapnil and unmarshalable-input pathspkg/mcpmw/audit_payload_test.goTestAudit_PayloadCapturesNotificationsend-to-end: receiving Audit seeds the recorder, the fake handler fires two progress notifications via the sending middleware against the same ctx, both land inPayload.Notificationswith the right method and paramspkg/audit/postgres/store_test.goLoground-trip, payload join, JSONB empties,unmarshalColwarn pathinternal/server/audit_options_test.goauditOptionsproduces the expected[]mcpmw.AuditOptionslice for each config shapeCloses (subtasks of #8)
Does not close the umbrella issue. Remaining: replay endpoint, SSE live tail, JSONB path filter compiler, NDJSON export, portal UI drawer, comparison page, and
docs/operations/inspection.md.Test plan
go test ./...green locallymake verifygreen at >=80% filtered coveragemake dev, call a tool that firesNotifyProgress(e.g.mcptest__progress),SELECT notifications FROM audit_payloads WHERE event_id = '<id>'returns a JSONB array with two elements, each{ts, method: "notifications/progress", params: {progress, total, message}}audit.max_notifications: 3in config, fire a tool that emits 10 notifications, confirm the stored array has exactly 3 entriesaudit.capture_payloads: false, fire the same tool, confirm noaudit_payloadsrow is written and the receiver does not seed a recorder (no extra ctx allocation per call)response_error.category = "auth_failed"(not"handler") in the resulting event