feat(conductor): wire follower policy-bundle poller#640
Conversation
Adds the follower-side policy-bundle poller: a third control-plane loop beside the audit transport and remote-kill poller. It polls the leader for the latest signed bundle over mTLS, is ETag-gated, strict-decodes, and applies through the existing verify->reload->activate boundary. Malformed, oversized, wrong-purpose, wrong-scope, wrong-signer, stale, or trailing- document bundles fail closed. Also: - Require a pinned trust roster fingerprint whenever conductor.enabled (independent of honor_remote_kill_switch). - Default-deny allowlist on a signed bundle's config_yaml: only enforcement-policy sections are accepted; operational/infrastructure and trust/identity/cert/sandbox sections are rejected. - Make flight_recorder restart-only on reload (was only signing_key_path). - Drop expected operational errors (listener EADDRINUSE) from Sentry.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds follower-side Conductor policy bundle polling with ETag caching and strict JSON/YAML validation; enforces a default-deny allowlist for top-level signed bundle ChangesConductor Policy Bundle Polling with Validation
🎯 4 (Complex) | ⏱️ ~45 minutes Possibly Related PRs
Suggested Labels
Suggested Reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
internal/cli/runtime/server_lifecycle.go (1)
231-249: 💤 Low valueSimplify the misleading guard around the remote-kill block.
The outer condition now tests
s.conductorBundle != nil(ands.conductorAudit), but this block only spawns the remote-kill goroutine — audit (Line 250) and bundle (Line 265) have their own top-levelifs. The extra disjuncts are dead with respect to this block and suggest the bundle poller is handled here. Collapse to the single relevant check for parity with the other two blocks.♻️ Proposed simplification
var conductorWG sync.WaitGroup - if s.conductorAudit != nil || s.conductorRemoteKill != nil || s.conductorBundle != nil { - if s.conductorRemoteKill != nil { - conductorWG.Add(1) - go func() { - defer conductorWG.Done() - defer func() { - if r := recover(); r != nil { - _, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller panic: %v\n", r) - cancel() - } - }() - if err := s.conductorRemoteKill.Run(ctx); err != nil && !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) { - s.logger.LogError(audit.NewResourceLogContext("conductor_remote_kill_poller", cfg.Conductor.ConductorURL), err) - _, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller stopped: %v\n", err) - cancel() - } - }() - } - } + if s.conductorRemoteKill != nil { + conductorWG.Add(1) + go func() { + defer conductorWG.Done() + defer func() { + if r := recover(); r != nil { + _, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller panic: %v\n", r) + cancel() + } + }() + if err := s.conductorRemoteKill.Run(ctx); err != nil && !errors.Is(err, context.Canceled) && !errors.Is(err, context.DeadlineExceeded) { + s.logger.LogError(audit.NewResourceLogContext("conductor_remote_kill_poller", cfg.Conductor.ConductorURL), err) + _, _ = fmt.Fprintf(s.opts.Stderr, "pipelock: conductor remote kill poller stopped: %v\n", err) + cancel() + } + }() + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/cli/runtime/server_lifecycle.go` around lines 231 - 249, The outer guard currently checks s.conductorAudit and s.conductorBundle but the block only launches the conductorRemoteKill goroutine; change the conditional to only test s.conductorRemoteKill != nil so the remote-kill block mirrors the separate top-level ifs for audit and bundle (leave the inner conductorRemoteKill != nil check, conductorWG.Add/Go wrapper, panic recovery, Run(ctx) error handling, logger/audit.NewResourceLogContext and cancel logic unchanged).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@enterprise/conductor/messages_test.go`:
- Line 316: Add a unit test that asserts bundles with forbidden top-level
sections return ErrForbiddenBundleSection: create a subtest (e.g.,
t.Run("forbidden_bundle_section", ...)), obtain a bundle via testPolicyBundle(),
set its ConfigYAML to include a disallowed top-level section (for example
"agents:\n claude-code:\n strict: true\n"), call b.Validate() and assert
errors.Is(err, ErrForbiddenBundleSection); reference testPolicyBundle(),
b.Validate(), and ErrForbiddenBundleSection when locating where to add this case
(mirror the structure of the existing forbidden_license_field test).
---
Nitpick comments:
In `@internal/cli/runtime/server_lifecycle.go`:
- Around line 231-249: The outer guard currently checks s.conductorAudit and
s.conductorBundle but the block only launches the conductorRemoteKill goroutine;
change the conditional to only test s.conductorRemoteKill != nil so the
remote-kill block mirrors the separate top-level ifs for audit and bundle (leave
the inner conductorRemoteKill != nil check, conductorWG.Add/Go wrapper, panic
recovery, Run(ctx) error handling, logger/audit.NewResourceLogContext and cancel
logic unchanged).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 0c410ef8-4196-4767-b620-e3177994af27
📒 Files selected for processing (15)
enterprise/conductor/messages.goenterprise/conductor/messages_test.goenterprise/conductor/policysync/poller.goenterprise/conductor/policysync/poller_test.gointernal/cli/runtime/conductor.gointernal/cli/runtime/conductor_license_test.gointernal/cli/runtime/conductor_stub.gointernal/cli/runtime/conductor_test.gointernal/cli/runtime/server.gointernal/cli/runtime/server_conductor_test.gointernal/cli/runtime/server_lifecycle.gointernal/cli/runtime/server_reload.gointernal/config/conductor.gointernal/config/conductor_test.gointernal/sentry/client.go
TestSignParseAndVerifyCRL hard-coded now = 2026-05-23 and built a 7-day CRL. SignCRL validates the payload expiry against time.Now(), so the fixture expired 7 days after it was written and the test began failing on a wall-clock date with no code change. Use time.Now() like the sibling CRL tests, which removes the expiry race.
What
Wires the follower-side Conductor policy-bundle poller so a follower automatically fetches, verifies, and applies signed policy bundles from the leader. Previously the leader could store and serve bundles and the apply boundary existed, but nothing on the follower polled, so central policy distribution did not actually happen.
How
A third control-plane loop runs beside the audit transport and remote-kill poller. Each tick it issues a GET to /api/v1/conductor/policy/latest over the shared mTLS client, ETag-gated so an unchanged bundle is a cheap no-op. A 200 is strict-decoded (unknown-field and trailing-document rejection) and applied through the existing verify, reload, activate boundary. The cached ETag advances only after a successful apply, so a transient apply failure is retried rather than masked by a later 304.
Fail-closed by construction: 204, 304, and no-newer responses are no-ops; malformed, oversized, wrong-purpose, wrong-scope, wrong-signer, stale, or trailing-document bundles are rejected without changing the running config.
Also in this PR
Trust roster required when enabled: a pinned trust_roster_root_fingerprint is now mandatory whenever conductor.enabled, independent of honor_remote_kill_switch. The honor flag governs only whether remote-kill state is applied, not whether trust material is required. A follower can no longer participate without a verified trust root.
Default-deny bundle-section allowlist: a signed bundle config_yaml may carry only enforcement-policy sections. Operational and infrastructure sections (listeners, emit, logging, kill switch, flight recorder, the conductor control plane, reverse proxy, metrics) and sections that mix enforcement with local trust, identity, certificates, routing, or OS isolation (SSRF, DNS, trusted-domains, agents, TLS interception, sandbox) are rejected. New config sections are rejected by default until explicitly allowlisted.
flight_recorder is restart-only on reload (previously only the signing key path was). The recorder is built once at startup and cannot rebind at runtime; preserving the whole block on reload also lets a follower apply an enforcement-only bundle without losing its signed recorder.
Operational errors no longer page error reporting: a listener bind hitting EADDRINUSE is an environment or operator condition, not a code defect, so it is dropped from error capture, mirroring the existing context.Canceled drop.
Compatibility
Core (non-enterprise) build keeps fail-closed stubs; conductor.enabled on a core binary still errors. No config defaults change; canonical policy hash is unaffected.
Tests
Poller status, ETag, and fail-closed matrix; fingerprint required regardless of the honor flag; bundle-section allowlist (accept enforcement sections, reject infra plus trust, identity, cert, and sandbox sections); the EADDRINUSE filter; startup wiring.
Summary by CodeRabbit
New Features
Bug Fixes
Tests