Skip to content

feat(conductor): wire follower policy-bundle poller#640

Merged
luckyPipewrench merged 3 commits into
mainfrom
feat/conductor-bundle-poller
May 30, 2026
Merged

feat(conductor): wire follower policy-bundle poller#640
luckyPipewrench merged 3 commits into
mainfrom
feat/conductor-bundle-poller

Conversation

@luckyPipewrench
Copy link
Copy Markdown
Owner

@luckyPipewrench luckyPipewrench commented May 30, 2026

What

Wires the follower-side Conductor policy-bundle poller so a follower automatically fetches, verifies, and applies signed policy bundles from the leader. Previously the leader could store and serve bundles and the apply boundary existed, but nothing on the follower polled, so central policy distribution did not actually happen.

How

A third control-plane loop runs beside the audit transport and remote-kill poller. Each tick it issues a GET to /api/v1/conductor/policy/latest over the shared mTLS client, ETag-gated so an unchanged bundle is a cheap no-op. A 200 is strict-decoded (unknown-field and trailing-document rejection) and applied through the existing verify, reload, activate boundary. The cached ETag advances only after a successful apply, so a transient apply failure is retried rather than masked by a later 304.

Fail-closed by construction: 204, 304, and no-newer responses are no-ops; malformed, oversized, wrong-purpose, wrong-scope, wrong-signer, stale, or trailing-document bundles are rejected without changing the running config.

Also in this PR

Trust roster required when enabled: a pinned trust_roster_root_fingerprint is now mandatory whenever conductor.enabled, independent of honor_remote_kill_switch. The honor flag governs only whether remote-kill state is applied, not whether trust material is required. A follower can no longer participate without a verified trust root.

Default-deny bundle-section allowlist: a signed bundle config_yaml may carry only enforcement-policy sections. Operational and infrastructure sections (listeners, emit, logging, kill switch, flight recorder, the conductor control plane, reverse proxy, metrics) and sections that mix enforcement with local trust, identity, certificates, routing, or OS isolation (SSRF, DNS, trusted-domains, agents, TLS interception, sandbox) are rejected. New config sections are rejected by default until explicitly allowlisted.

flight_recorder is restart-only on reload (previously only the signing key path was). The recorder is built once at startup and cannot rebind at runtime; preserving the whole block on reload also lets a follower apply an enforcement-only bundle without losing its signed recorder.

Operational errors no longer page error reporting: a listener bind hitting EADDRINUSE is an environment or operator condition, not a code defect, so it is dropped from error capture, mirroring the existing context.Canceled drop.

Compatibility

Core (non-enterprise) build keeps fail-closed stubs; conductor.enabled on a core binary still errors. No config defaults change; canonical policy hash is unaffected.

Tests

Poller status, ETag, and fail-closed matrix; fingerprint required regardless of the honor flag; bundle-section allowlist (accept enforcement sections, reject infra plus trust, identity, cert, and sandbox sections); the EADDRINUSE filter; startup wiring.

Summary by CodeRabbit

  • New Features

    • Follower nodes can poll leaders for signed policy bundles with ETag caching, safe response limits, retries, and lifecycle/startup integration.
  • Bug Fixes

    • Policy bundle validation now rejects unknown top-level config sections and reports the offending key.
    • Conductor now requires a trust-roster fingerprint when enabled.
    • Flight recorder changes are preserved across reloads.
    • Error reporting omits additional expected operational errors.
  • Tests

    • Expanded tests for polling, validation, config, and lifecycle behavior.

Review Change Stack

Adds the follower-side policy-bundle poller: a third control-plane loop
beside the audit transport and remote-kill poller. It polls the leader for
the latest signed bundle over mTLS, is ETag-gated, strict-decodes, and
applies through the existing verify->reload->activate boundary. Malformed,
oversized, wrong-purpose, wrong-scope, wrong-signer, stale, or trailing-
document bundles fail closed.

Also:
- Require a pinned trust roster fingerprint whenever conductor.enabled
  (independent of honor_remote_kill_switch).
- Default-deny allowlist on a signed bundle's config_yaml: only
  enforcement-policy sections are accepted; operational/infrastructure and
  trust/identity/cert/sandbox sections are rejected.
- Make flight_recorder restart-only on reload (was only signing_key_path).
- Drop expected operational errors (listener EADDRINUSE) from Sentry.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a060981f-8b74-4ae2-b722-f125b5ed1974

📥 Commits

Reviewing files that changed from the base of the PR and between db0691c and d8f04d6.

📒 Files selected for processing (1)
  • internal/license/crl_test.go

📝 Walkthrough

Walkthrough

Adds follower-side Conductor policy bundle polling with ETag caching and strict JSON/YAML validation; enforces a default-deny allowlist for top-level signed bundle config_yaml keys via ErrForbiddenBundleSection; requires a pinned trust-roster fingerprint whenever Conductor is enabled; wires poller into server lifecycle and adds tests.

Changes

Conductor Policy Bundle Polling with Validation

Layer / File(s) Summary
Policy bundle config section allowlist
enterprise/conductor/messages.go, enterprise/conductor/messages_test.go
New sentinel error ErrForbiddenBundleSection and allowedPolicyBundleSections allowlist enforce default-deny validation for top-level config_yaml keys; PolicyBundle.Validate() rejects bundles containing disallowed sections. Test fixture updated to use mcp_tool_policy.
Unconditional trust roster fingerprint requirement
internal/config/conductor.go, internal/config/conductor_test.go
validateConductor now mandates non-empty, parseable trust_roster_root_fingerprint whenever conductor.enabled is true, regardless of honor_remote_kill_switch. Tests verify missing/bad fingerprint errors and acceptance when provided.
Poller interfaces and configuration
enterprise/conductor/policysync/poller.go
Exported HTTPDoer and Applier interfaces, ApplierFunc adapter, PollerConfig struct, and Poller type with mutex-protected ETag caching; LatestPolicyBundlePath constant and sentinel errors defined.
Poller constructor, polling loop, and request handling
enterprise/conductor/policysync/poller.go
NewPoller validates endpoint URL (HTTPS-only, no userinfo/query/fragment) and applies defaults. Run executes a timer-based poll loop with context-aware shutdown and error logging. PollOnce builds GET requests with If-None-Match, handles 200/204/304 statuses, strictly decodes JSON with unknown-field and trailing-document checks, enforces response-size bounds, applies bundles via injected applier, and advances cached ETag only after successful apply.
Policy bundle poller test suite
enterprise/conductor/policysync/poller_test.go
Comprehensive tests with HTTP and applier stubs covering constructor validation, defaults, HTTP status handling (200/204/304/error), strict JSON checks and trailing-document detection, body overflow protection, transport/read failures, request header assertions, ETag semantics (advance on success, preserve on applier failure, 304 no-op), Run shutdown/logging, and nil-receiver safety.
Poller construction and server initialization
internal/cli/runtime/conductor.go
buildConductorBundlePoller creates an mTLS client, always builds a trust resolver for bundle verification, parses poll interval, configures a JSON logger, and wires an applier callback that forwards bundles to ApplyConductorPolicyBundle. initConductorBundlePoller conditionally builds and stores the poller on the Server.
Server conductorBundle field and NewServer wiring
internal/cli/runtime/server.go
Server adds conductorBundle runner field; NewServer calls initConductorBundlePoller during startup and performs cleanup on error.
Startup advertisement and background polling goroutine
internal/cli/runtime/server_lifecycle.go
Startup output advertises "Conductor: policy bundle polling enabled" when s.conductorBundle is configured. A background goroutine runs s.conductorBundle.Run(ctx) with panic recovery; non-cancellation errors are audit-logged and printed to stderr, and runtime cancellation is triggered on failure.
Conductor integration tests and test infrastructure
internal/cli/runtime/conductor_test.go, internal/cli/runtime/server_conductor_test.go
Tests verify poller startup fails when a pinned trust roster fingerprint is missing even with honor_remote_kill_switch=false, that poller is nil when conductor disabled, and that NewServer wires a poller when enabled. Test helpers generate signed runtime trust rosters and pinned fingerprints; includes blocking runner for lifecycle tests.
Apache-only poller stub
internal/cli/runtime/conductor_stub.go
Apache build adds no-op initConductorBundlePoller that touches core fields and returns errConductorEnterpriseBuildRequired when Conductor is enabled.
Conductor license gate test fixture update
internal/cli/runtime/conductor_license_test.go
Test config YAML now includes placeholder trust_roster_root_fingerprint required when conductor.enabled is true.
Flight recorder reload protection
internal/cli/runtime/server_reload.go
Reload guard expanded to deep-compare the full FlightRecorder struct; any difference preserves the old recorder block in the new config and emits a tailored warning distinguishing signing-key-path changes from other changes.
Operational error classification for Sentry
internal/sentry/client.go
CaptureError now filters expected operational errors (currently syscall.EADDRINUSE) in addition to nil-client and context.Canceled, via a new isExpectedOperationalError helper.
CRL test timestamp
internal/license/crl_test.go
TestSignParseAndVerifyCRL now uses time.Now().UTC() to avoid fixed-time expiry issues in tests.

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly Related PRs

Suggested Labels

size:L

Suggested Reviewers

  • DeathCamel58
  • BlueAceFrost

Poem

I hop through logs at morning light,
Fetching bundles by ETag's sight,
I guard each key with allowlist cheer,
Trust-roster pinned — the path is clear. 🐇✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding follower-side Conductor policy-bundle poller wiring, which is the primary feature delivered in this PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/conductor-bundle-poller

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@sentry
Copy link
Copy Markdown

sentry Bot commented May 30, 2026

@luckyPipewrench luckyPipewrench self-assigned this May 30, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
internal/cli/runtime/server_lifecycle.go (1)

231-249: 💤 Low value

Simplify the misleading guard around the remote-kill block.

The outer condition now tests s.conductorBundle != nil (and s.conductorAudit), but this block only spawns the remote-kill goroutine — audit (Line 250) and bundle (Line 265) have their own top-level ifs. The extra disjuncts are dead with respect to this block and suggest the bundle poller is handled here. Collapse to the single relevant check for parity with the other two blocks.

♻️ Proposed simplification
 	var conductorWG sync.WaitGroup
-	if s.conductorAudit != nil || s.conductorRemoteKill != nil || s.conductorBundle != nil {
-		if s.conductorRemoteKill != nil {
-			conductorWG.Add(1)
-			go func() {
-				defer conductorWG.Done()
-				defer func() {
-					if r := recover(); r != nil {
-						_, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller panic: %v\n", r)
-						cancel()
-					}
-				}()
-				if err := s.conductorRemoteKill.Run(ctx); err != nil && !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) {
-					s.logger.LogError(audit.NewResourceLogContext("conductor_remote_kill_poller", cfg.Conductor.ConductorURL), err)
-					_, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller stopped: %v\n", err)
-					cancel()
-				}
-			}()
-		}
-	}
+	if s.conductorRemoteKill != nil {
+		conductorWG.Add(1)
+		go func() {
+			defer conductorWG.Done()
+			defer func() {
+				if r := recover(); r != nil {
+					_, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller panic: %v\n", r)
+					cancel()
+				}
+			}()
+			if err := s.conductorRemoteKill.Run(ctx); err != nil && !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) {
+				s.logger.LogError(audit.NewResourceLogContext("conductor_remote_kill_poller", cfg.Conductor.ConductorURL), err)
+				_, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller stopped: %v\n", err)
+				cancel()
+			}
+		}()
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/cli/runtime/server_lifecycle.go` around lines 231 - 249, The outer
guard currently checks s.conductorAudit and s.conductorBundle but the block only
launches the conductorRemoteKill goroutine; change the conditional to only test
s.conductorRemoteKill != nil so the remote-kill block mirrors the separate
top-level ifs for audit and bundle (leave the inner conductorRemoteKill != nil
check, conductorWG.Add/Go wrapper, panic recovery, Run(ctx) error handling,
logger/audit.NewResourceLogContext and cancel logic unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@enterprise/conductor/messages_test.go`:
- Line 316: Add a unit test that asserts bundles with forbidden top-level
sections return ErrForbiddenBundleSection: create a subtest (e.g.,
t.Run("forbidden_bundle_section", ...)), obtain a bundle via testPolicyBundle(),
set its ConfigYAML to include a disallowed top-level section (for example
"agents:\n  claude-code:\n    strict: true\n"), call b.Validate() and assert
errors.Is(err, ErrForbiddenBundleSection); reference testPolicyBundle(),
b.Validate(), and ErrForbiddenBundleSection when locating where to add this case
(mirror the structure of the existing forbidden_license_field test).

---

Nitpick comments:
In `@internal/cli/runtime/server_lifecycle.go`:
- Around line 231-249: The outer guard currently checks s.conductorAudit and
s.conductorBundle but the block only launches the conductorRemoteKill goroutine;
change the conditional to only test s.conductorRemoteKill != nil so the
remote-kill block mirrors the separate top-level ifs for audit and bundle (leave
the inner conductorRemoteKill != nil check, conductorWG.Add/Go wrapper, panic
recovery, Run(ctx) error handling, logger/audit.NewResourceLogContext and cancel
logic unchanged).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0c410ef8-4196-4767-b620-e3177994af27

📥 Commits

Reviewing files that changed from the base of the PR and between c8f6dcc and 06a6960.

📒 Files selected for processing (15)
  • enterprise/conductor/messages.go
  • enterprise/conductor/messages_test.go
  • enterprise/conductor/policysync/poller.go
  • enterprise/conductor/policysync/poller_test.go
  • internal/cli/runtime/conductor.go
  • internal/cli/runtime/conductor_license_test.go
  • internal/cli/runtime/conductor_stub.go
  • internal/cli/runtime/conductor_test.go
  • internal/cli/runtime/server.go
  • internal/cli/runtime/server_conductor_test.go
  • internal/cli/runtime/server_lifecycle.go
  • internal/cli/runtime/server_reload.go
  • internal/config/conductor.go
  • internal/config/conductor_test.go
  • internal/sentry/client.go

Comment thread enterprise/conductor/messages_test.go
@luckyPipewrench luckyPipewrench enabled auto-merge (squash) May 30, 2026 13:29
TestSignParseAndVerifyCRL hard-coded now = 2026-05-23 and built a 7-day
CRL. SignCRL validates the payload expiry against time.Now(), so the
fixture expired 7 days after it was written and the test began failing
on a wall-clock date with no code change. Use time.Now() like the
sibling CRL tests, which removes the expiry race.
@luckyPipewrench luckyPipewrench merged commit cde0ce1 into main May 30, 2026
19 checks passed
@luckyPipewrench luckyPipewrench deleted the feat/conductor-bundle-poller branch May 30, 2026 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants