-
Notifications
You must be signed in to change notification settings - Fork 111
add liveness, readiness, and startup probes to noobaa-operator #1684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds health probes and memory settings to the operator Deployment, wires controller-runtime health probe endpoint in the manager, and updates the embedded deployment bundle and its SHA to reflect these YAML changes. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant KS as Kubernetes Scheduler
participant Pod as Operator Pod/Container
participant Mgr as Controller-Runtime Manager
participant HP as Health Probe Endpoint (:8081 /readyz)
participant Kubelet as Kubelet Probes
KS->>Pod: Start container with args (operator run --health-probe-bind-address=:8081)
Pod->>Mgr: Initialize managers
Mgr->>Mgr: Set HealthProbeBindAddress=":8081"
Mgr->>HP: Serve health endpoints
Mgr->>Mgr: AddReadyzCheck("readyz", healthz.Ping)
rect rgba(230,245,255,0.5)
Note over Kubelet,HP: Startup probe sequence
loop until success or 30 failures
Kubelet->>HP: HTTP GET /readyz
HP-->>Kubelet: 200/Failure
end
end
rect rgba(240,255,240,0.5)
Note over Kubelet,HP: Readiness probe
loop every 10s (post initialDelay 5s)
Kubelet->>HP: HTTP GET /readyz
HP-->>Kubelet: 200/Failure
end
end
rect rgba(255,245,230,0.5)
Note over Kubelet,HP: Liveness probe
loop every 10s (post initialDelay 15s)
Kubelet->>HP: HTTP GET /readyz
HP-->>Kubelet: 200/Failure (restart on threshold)
end
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Suggested reviewers
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
pkg/operator/manager.go (3)
48-50
: Harden flag parsing: fall back to a sane default and log.If the flag isn’t present (e.g., invoked via the wrong subcommand) or empty, you’ll end up setting an empty HealthProbeBindAddress, which disables the probe server. Add a defensive fallback.
- // Probe address from CLI flag (defaults to :8081) - probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") + // Probe address from CLI flag; fall back to default if missing + probeAddr, err := cmd.Flags().GetString("health-probe-bind-address") + if err != nil || probeAddr == "" { + probeAddr = ":8081" + log.Infof("health-probe-bind-address not set; defaulting to %s", probeAddr) + }
72-73
: Health probe server bound only on the namespace-scoped manager. Confirm this is intentional.Only mgr has HealthProbeBindAddress; cmgr does not. If the cluster-scoped manager wedges, probes will still report healthy. If that’s acceptable, ignore. Otherwise consider proxying a simple cmgr health indicator into mgr’s checks.
104-112
: Consider a stronger readiness check (cache sync) instead of Ping-only.healthz.Ping always returns success. For readiness, it’s common to gate on informer cache sync to avoid advertising Ready before controllers are usable.
Example:
// after mgr is created if err := mgr.AddReadyzCheck("cache-sync", func(_ *http.Request) error { if mgr.GetCache().WaitForCacheSync(util.Context()) { return nil } return fmt.Errorf("caches not yet synced") }); err != nil { log.Fatalf("Failed to add readiness check: %s", err) }If blocking per request is a concern, you can set a one-time “synced” atomic flag from a Runnable that waits on cache sync and have the checker read that flag.
deploy/operator.yaml (2)
48-49
: Fix indentation for ports block to satisfy yamllint and avoid accidental structural errors.Yamllint warns: expected indent 10 but found 12. Align ports with args/volumeMounts under the container spec.
- ports: - - name: healthz + ports: + - name: healthz containerPort: 8081You can validate with:
- yamllint deploy/operator.yaml
- kubectl apply --dry-run=client -f deploy/operator.yaml
59-67
: Readiness probe looks good./readyz on the named port is correct. Timings are reasonable for controller-runtime defaults. If you adopt cache-sync readiness, /readyz will naturally gate until caches are synced.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
deploy/operator.yaml
(2 hunks)pkg/operator/manager.go
(4 hunks)pkg/operator/operator.go
(1 hunks)
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[error] 45-45: trailing spaces
(trailing-spaces)
[warning] 48-48: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
pkg/operator/manager.go (1)
27-27
: Import of controller-runtime healthz is appropriate.This is the correct package for registering /healthz and /readyz checks.
deploy/operator.yaml (2)
68-75
: Liveness probe is sensible./healthz with a modest initialDelaySeconds is a standard setup for controller-runtime. No concerns.
76-81
: Startup probe configuration is conservative and safe.failureThreshold 30 with periodSeconds 10 gives up to 5 minutes for cold starts and image pulls. This helps prevent premature restarts.
e07011f
to
c6acf02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
deploy/operator.yaml
(2 hunks)pkg/operator/manager.go
(4 hunks)pkg/operator/operator.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/operator.go
- pkg/operator/manager.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (2)
deploy/operator.yaml (2)
60-82
: LGTM! Health probe configuration looks good.The probe configuration is well-structured:
- Readiness probe uses
/readyz
endpoint with a short initial delay (5s)- Liveness probe uses
/healthz
endpoint with appropriate initial delay (15s)- Startup probe provides sufficient time for initialization (300s total with 30 failures × 10s)
- All probes use consistent timeout and period settings
The probe timings follow Kubernetes best practices for operator workloads.
52-54
: Approved: bound-sa-token mount can safely remain readOnlyI searched the codebase for any write operations targeting
/var/run/secrets/openshift/serviceaccount
(including Go standard library file writes) and found only references for reading the token file—no code writes to this path. Therefore, mounting the service account token asreadOnly: true
is safe and aligns with security best practices.
c6acf02
to
63cf177
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
deploy/operator.yaml (3)
44-47
: Good: explicit "run" subcommand included.This addresses the earlier issue where the manager wouldn't start without "run". With the whitespace fix above, the CLI will parse as intended.
48-50
: Fix ports block indentation to conform with lint rules.Yamllint expects the dash items to align directly under the key. Apply:
- ports: - - name: healthz - containerPort: 8081 + ports: + - name: healthz + containerPort: 8081
44-47
: Trailing spaces in args will break subcommand parsing; fix and align list indentation to satisfy yamllint.The values "operator␠␠" and "run␠␠␠␠␠␠␠" include trailing spaces, which become part of the argv and will not match Cobra subcommands. Also, yamllint flags the sequence indentation under args. Apply:
- args: - - operator - - run - - --health-probe-bind-address=:8081 + args: + - operator + - run + - --health-probe-bind-address=:8081
🧹 Nitpick comments (1)
deploy/operator.yaml (1)
68-75
: When using startupProbe, drop liveness initialDelaySeconds to avoid redundant delay.Kubelet suppresses liveness until startup passes, so initialDelaySeconds on liveness is unnecessary and can delay recovery after restarts. Suggest:
livenessProbe: httpGet: path: /healthz port: healthz - initialDelaySeconds: 15 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
deploy/operator.yaml
(2 hunks)pkg/operator/manager.go
(4 hunks)pkg/operator/operator.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/operator.go
- pkg/operator/manager.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
deploy/operator.yaml (3)
60-67
: Readiness probe looks correct and matches controller-runtime defaults./readyz on the named port is standard; timings are reasonable.
76-82
: Startup probe parameters are sane./healthz with failureThreshold: 30 and period: 10s gives up to ~5 minutes for the operator to come up, which is acceptable for CRD installs/migrations.
44-82
: Health/Readiness Probes Configuration VerifiedAll checks confirm that the operator binary and manifest are correctly configured:
- The
--health-probe-bind-address
flag is defined with default:8081
inpkg/operator/operator.go:37
and consumed inpkg/operator/manager.go:49
.- The health and readiness endpoints are registered via
mgr.AddHealthzCheck("healthz", healthz.Ping)
andmgr.AddReadyzCheck("readyz", healthz.Ping)
inpkg/operator/manager.go:105,109
.- No other container in the manifest binds port
8081
, and the probe indeploy/operator.yaml:47–50
matches these settings.No changes are required.
63cf177
to
3eb84c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
deploy/operator.yaml (2)
44-47
: Trim trailing spaces and fix args indentation (yamllint errors).YAMLlint flags trailing spaces on Lines 45–46 and wrong indentation on Line 45. Reduce the list item indent by 2 and remove trailing spaces so CI passes and the manifest parses consistently.
Apply:
args: - - operator - - run - - --health-probe-bind-address=:8081 + - operator + - run + - --health-probe-bind-address=:8081Run to verify:
#!/bin/bash pip install -q yamllint yamllint -d "{extends: default, rules: {line-length: disable}}" deploy/operator.yaml
48-50
: Align ports list indentation with container-level keys.List item under ports is over-indented (yamllint: expected 10, found 12). Align the dash with two spaces more than the ports key.
Apply:
ports: - - name: healthz - containerPort: 8081 + - name: healthz + containerPort: 8081Re-run yamllint (same command as above) to confirm no indentation warnings.
🧹 Nitpick comments (5)
pkg/operator/manager.go (2)
72-72
: Health probes attached only to the namespace-scoped managerYou bind the probe server only on
mgr
(namespace-scoped). If the cluster-scoped manager (cmgr
) fails to start or crashes, the pod’s readiness/liveness may still report healthy because the probe targets onlymgr
. If that’s intentional, ignore; otherwise consider one of:
- Also enabling a probe server for
cmgr
(e.g., on a sibling port like:8082
) and wiring probes for it.- Or keep a single probe server (on
mgr
) but add a readiness check that reflectscmgr
health (e.g., flip a flag oncecmgr
caches sync or controllers start).If you prefer the second approach, I can sketch a checker that reports NotReady until
cmgr
signals it’s up.
104-111
: Readiness uses Ping only; consider gating on cache sync to avoid false positivesRegistering
healthz.Ping
for both health and readiness works, but readiness will go green as soon as the HTTP server is up, not when informers/controllers are actually ready. Consider adding a cache-sync readiness check so/readyz
turns 200 only after caches synchronize.Example change (adds a cache-sync runnable and checker):
// Register health and readiness endpoints on mgr if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { log.Fatalf("Failed to add health check: %s", err) } - if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { - log.Fatalf("Failed to add readiness check: %s", err) - } + // Ready when caches have synced + if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { + log.Fatalf("Failed to add readiness ping: %s", err) + } + if err := mgr.AddReadyzCheck("caches", cacheReadyChecker()); err != nil { + log.Fatalf("Failed to add cache readiness check: %s", err) + }Add this helper (outside the shown range), which uses a one-time sync and a fast check per request:
// at file top-level: var cachesSynced atomic.Bool // near other runnables, before starting managers: util.Panic(mgr.Add(manager.RunnableFunc(func(ctx context.Context) error { if ok := mgr.GetCache().WaitForCacheSync(ctx); !ok { return fmt.Errorf("cache sync failed") } cachesSynced.Store(true) <-ctx.Done() return nil }))) // helper: func cacheReadyChecker() healthz.Checker { return func(_ *http.Request) error { if !cachesSynced.Load() { return fmt.Errorf("caches not yet synced") } return nil } }Note: you’ll need
import "net/http"
andimport "sync/atomic"
for the helper.pkg/bundle/deploy.go (2)
6251-6273
: Probe set is sensible; consider strengthening readyz semantics.
- The readiness, liveness, and startup probes use /readyz and /healthz on the dedicated probe server. Timing defaults (readiness=5s initial delay, liveness=15s, startup=30×10s) look balanced for controller-runtime operators.
- One potential gap: if your manager registers
readyz
withhealthz.Ping
, readiness will always succeed once the probe server is up (it doesn’t wait for caches/leader election). If that’s your current implementation, consider a readiness check that flips true only after caches are synced (or after leader election if that’s your intended signal).Example (Go) of a non-blocking cache-sync readiness gate:
// after mgr is created: var cachesSynced atomic.Bool go func() { // Wait until the manager starts and caches sync once <-mgr.Elected() // optional: if leader election enabled if ok := mgr.GetCache().WaitForCacheSync(context.Background()); ok { cachesSynced.Store(true) } }() // register probes _ = mgr.AddHealthzCheck("healthz", healthz.Ping) _ = mgr.AddReadyzCheck("cache-sync", func(_ *http.Request) error { if cachesSynced.Load() { return nil } return fmt.Errorf("caches not yet synced") })This avoids reporting Pods as Ready before controllers can actually reconcile.
If you’ve already wired a stronger readyz check in manager.go, feel free to ignore; otherwise I can open a follow-up PR with this change.
6251-6273
: Nit: explicitly set probe scheme for forward-compatibility (optional).Kubelet defaults HTTP for
httpGet
, but being explicit can prevent surprises if defaults change or if TLS is enabled later:readinessProbe: httpGet: path: /readyz - port: healthz + port: healthz + scheme: HTTP ... livenessProbe: httpGet: path: /healthz - port: healthz + port: healthz + scheme: HTTP ... startupProbe: httpGet: path: /healthz - port: healthz + port: healthz + scheme: HTTPdeploy/operator.yaml (1)
60-82
: Probes wiring looks correct; optional tuning with startupProbe present.Endpoints and named port resolve properly. With startupProbe enabled, initialDelaySeconds on liveness/readiness is redundant; kubelet suppresses them until startup succeeds. Consider removing the initial delays to simplify timing semantics.
Optional diff:
readinessProbe: httpGet: path: /readyz port: healthz - initialDelaySeconds: 5 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 livenessProbe: httpGet: path: /healthz port: healthz - initialDelaySeconds: 15 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3If you keep the delays, that’s fine too—behavior is still correct.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deploy/operator.yaml
(2 hunks)pkg/bundle/deploy.go
(3 hunks)pkg/operator/manager.go
(4 hunks)pkg/operator/operator.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/operator/operator.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
pkg/operator/manager.go (1)
27-27
: LGTM: correct import for controller-runtime health checksImporting
sigs.k8s.io/controller-runtime/pkg/healthz
is appropriate for registering/healthz
and/readyz
endpoints.pkg/bundle/deploy.go (2)
6235-6238
: Args look correct; verify entrypoint supports subcommand and flag.
- The container now invokes:
operator run --health-probe-bind-address=:8081
. That matches controller-runtime convention, but only if your binary’s root command exposes therun
subcommand and recognizes the probe flag.- Please sanity-check the operator image you ship in all deployment channels accepts these args; otherwise the container may exit immediately.
Would you like a quick script to pull the image used in CI and validate the help/flags exposed by the binary? I can supply one.
6240-6241
: Port naming and probe port reference are consistent.Defining a named container port
healthz: 8081
and referencing it from the probes is a good practice that makes refactors safer. No issues here.
pkg/operator/manager.go
Outdated
// Probe address from CLI flag (defaults to :8081) | ||
probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Don’t ignore flag parsing errors; add a safe default for probe address
cmd.Flags().GetString(...)
error is discarded. While it should succeed if the flag is defined, ignoring the error can mask regressions and leaves probeAddr
empty (which disables the probe server) if the flag is missing or mistyped.
Apply this minimal diff to handle the error and ensure a sane default:
- // Probe address from CLI flag (defaults to :8081)
- probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+ // Probe address from CLI flag (defaults to :8081)
+ probeAddr, err := cmd.Flags().GetString("health-probe-bind-address")
+ if err != nil {
+ log.Warnf("Failed reading --health-probe-bind-address flag: %v (defaulting to :8081)", err)
+ probeAddr = ":8081"
+ }
+ if probeAddr == "" {
+ log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+ probeAddr = ":8081"
+ }
To double-check the flag wiring across the repo, run:
🏁 Script executed:
#!/bin/bash
# Verify the flag is defined and used in manifests
rg -n --type=go -C2 'health-probe-bind-address'
rg -n -C3 'health-probe-bind-address' -- 'deploy/**' 'deploy/*.yaml' 'pkg/bundle/**' 2>/dev/null || true
Length of output: 1109
Handle health-probe-bind-address flag parsing errors in pkg/operator/manager.go
- Verified that
--health-probe-bind-address
is defined as a persistent flag inpkg/operator/operator.go
with a default of:8081
, and is referenced in the bundled manifests. - In
pkg/operator/manager.go
(around line 49), callingcmd.Flags().GetString(...)
- Discards any lookup error (e.g., if the flag name is mistyped or not registered on this command)
- May never pick up the persistent flag, since it lives in the parent command’s flag set
- Leaves
probeAddr
empty on error, inadvertently disabling health probes
- To ensure the flag is read correctly and to guard against missing or empty values, switch to the persistent flag set, check the error, and fall back to the known default.
Apply this diff:
--- a/pkg/operator/manager.go
+++ b/pkg/operator/manager.go
@@ -46,4 +46,12 @@ func NewManagerCommand(...) {
- // Probe address from CLI flag (defaults to :8081)
- probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address")
+ // Probe address from CLI flag (defaults to :8081)
+ probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address")
+ if err != nil {
+ log.Warnf("Failed to read --health-probe-bind-address: %v, defaulting to :8081", err)
+ probeAddr = ":8081"
+ }
+ if probeAddr == "" {
+ log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081")
+ probeAddr = ":8081"
+ }
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
// Probe address from CLI flag (defaults to :8081) | |
probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") | |
// Probe address from CLI flag (defaults to :8081) | |
probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address") | |
if err != nil { | |
log.Warnf("Failed to read --health-probe-bind-address: %v, defaulting to :8081", err) | |
probeAddr = ":8081" | |
} | |
if probeAddr == "" { | |
log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081") | |
probeAddr = ":8081" | |
} |
🤖 Prompt for AI Agents
In pkg/operator/manager.go around lines 48 to 50, the call to
cmd.Flags().GetString("health-probe-bind-address") ignores errors and reads the
local flagset (so it can miss the persistent flag), which can leave probeAddr
empty and disable probes; update to use
cmd.PersistentFlags().GetString("health-probe-bind-address"), check and handle
the returned error (log/return or fallback), and if the value is empty or an
error occurred, set probeAddr to the known default ":8081" so health probes
remain enabled.
@dannyzaken @tangledbytes please take a look |
4e1b572
to
6095110
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
deploy/operator.yaml (2)
44-47
: Trim trailing spaces in args to keep YAML clean and avoid linter failures.
Matches prior feedback; still present on "operator" and "run".Apply:
args: - - operator - - run + - operator + - run - --health-probe-bind-address=:8081
48-50
: Fix ports block indentation.
Keep it aligned with other container-level keys.- ports: - - name: healthz - containerPort: 8081 + ports: + - name: healthz + containerPort: 8081pkg/operator/manager.go (1)
48-49
: Don’t ignore flag errors; read from PersistentFlags and default safely.
Ignoring the error can disable probes if the flag isn’t found.- // Probe address from CLI flag (defaults to :8081) - probeAddr, _ := cmd.Flags().GetString("health-probe-bind-address") + // Probe address from CLI flag (defaults to :8081) + probeAddr, err := cmd.PersistentFlags().GetString("health-probe-bind-address") + if err != nil { + log.Warnf("Failed reading --health-probe-bind-address: %v (defaulting to :8081)", err) + probeAddr = ":8081" + } + if probeAddr == "" { + log.Warn("Empty --health-probe-bind-address disables probes; defaulting to :8081") + probeAddr = ":8081" + }Run to confirm the flag is defined and referenced:
#!/bin/bash rg -n --type=go -C2 'health-probe-bind-address'
🧹 Nitpick comments (1)
pkg/operator/manager.go (1)
104-111
: Checks registration LGTM; consider clearer names and optional cache-ready check.
Minor: use names like "ping" instead of repeating endpoint names; optionally add a cache sync readiness check if needed.- if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { + if err := mgr.AddHealthzCheck("ping", healthz.Ping); err != nil { log.Fatalf("Failed to add health check: %s", err) } - if err := mgr.AddReadyzCheck("readyz", healthz.Ping); err != nil { + if err := mgr.AddReadyzCheck("ping", healthz.Ping); err != nil { log.Fatalf("Failed to add readiness check: %s", err) }Optional cache-ready example (if you want stricter readiness):
ready := atomic.Bool{} _ = mgr.Add(manager.RunnableFunc(func(ctx context.Context) error { // mark ready after caches start ready.Store(true) <-ctx.Done() return nil })) _ = mgr.AddReadyzCheck("started", func(_ *http.Request) error { if !ready.Load() { return fmt.Errorf("not started") } return nil })
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deploy/operator.yaml
(2 hunks)pkg/bundle/deploy.go
(3 hunks)pkg/operator/manager.go
(4 hunks)pkg/operator/operator.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/operator.go
- pkg/bundle/deploy.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
deploy/operator.yaml (1)
76-82
: Startup probe endpoint choice is fine.
Using /readyz here is acceptable; no change needed.pkg/operator/manager.go (2)
27-27
: Import healthz looks good.
Required for adding checks.
72-72
: Wiring HealthProbeBindAddress into mgr is correct.
This enables /healthz and /readyz.
readinessProbe: | ||
httpGet: | ||
path: /readyz | ||
port: healthz | ||
initialDelaySeconds: 5 | ||
periodSeconds: 10 | ||
timeoutSeconds: 5 | ||
failureThreshold: 3 | ||
livenessProbe: | ||
httpGet: | ||
path: /readyz | ||
port: healthz | ||
initialDelaySeconds: 15 | ||
periodSeconds: 10 | ||
timeoutSeconds: 5 | ||
failureThreshold: 3 | ||
startupProbe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Use /healthz for liveness; /readyz for readiness (current YAML uses /readyz for liveness).
This avoids killing a live process just because it isn’t ready yet and matches the PR description.
readinessProbe:
httpGet:
path: /readyz
port: healthz
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
🏁 Script executed:
#!/usr/bin/env bash
set -e
# Locate the operator.yaml file
OP_FILE=$(fd --type f operator.yaml || true)
echo "Found file: $OP_FILE"
[ -n "$OP_FILE" ] || { echo "operator.yaml not found"; exit 1; }
# Print lines 60–76 for verification
sed -n '60,76p' "$OP_FILE"
Length of output: 698
Use /healthz for livenessProbe
Swap the livenessProbe HTTP path to /healthz
(keep readinessProbe on /readyz
) to avoid killing a live process before it’s ready.
🤖 Prompt for AI Agents
In deploy/operator.yaml around lines 60 to 76, the livenessProbe is using the
/readyz path which can cause a live process to be killed before it is fully
ready; change the livenessProbe httpGet.path value from /readyz to /healthz
while leaving the readinessProbe on /readyz (no other probe changes required).
failureThreshold: 3 | ||
livenessProbe: | ||
httpGet: | ||
path: /readyz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all probes be the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we only perform ping, I suppose it's okay for them to be set to the same path. In the future, we might want to use more advanced probes, but for now, it's enough.
pkg/operator/manager.go
Outdated
} | ||
|
||
// Register health and readiness endpoints on mgr | ||
if err := mgr.AddHealthzCheck("healthz", healthz.Ping); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use readyz
for both probes, I think we can remove healthz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/operator/operator.go
Outdated
Short: "Deployment using operator", | ||
} | ||
// health-probe flag available on operator and inherited by subcommands | ||
cmd.PersistentFlags().String("health-probe-bind-address", ":8081", "HTTP address for health/readiness probes (e.g., :8081)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have a flag for the probe address? If there isn't a need to change it for some cases, we can have it defined in manager.go (e.g.: similar to the metrics address\port here)
- integrate controller-runtime healthz and readyz endpoints into the namespace-scoped manager - expose probe server on :8081 and register default checks - update operator Deployment with liveness, readiness, and startup probes Signed-off-by: Oded Viner <[email protected]>
6095110
to
f87a9a7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
deploy/operator.yaml (2)
48-50
: Align ports list indentation under 'ports:'.Re-align the sequence to satisfy yamllint’s expected indentation.
- ports: - - name: healthz - containerPort: 8081 + ports: + - name: healthz + containerPort: 8081
68-75
: Use /healthz for livenessProbe (keep /readyz for readiness/startup).Best practice is liveness→/healthz, readiness/startup→/readyz. With current ping checks it’s mostly equivalent, but /healthz avoids coupling liveness to readiness state.
livenessProbe: httpGet: - path: /readyz + path: /healthz port: healthz
🧹 Nitpick comments (1)
pkg/bundle/deploy.go (1)
6270-6275
: Optional: consider /healthz for startupProbe for consistency.Startup can also target /healthz; keeping readiness on /readyz and liveness/startup on /healthz is a common pattern.
startupProbe: httpGet: - path: /readyz + path: /healthz port: healthz periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 30
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (4)
deploy/operator.yaml
(2 hunks)pkg/bundle/deploy.go
(3 hunks)pkg/operator/manager.go
(4 hunks)pkg/operator/operator.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/operator/manager.go
- pkg/operator/operator.go
🧰 Additional context used
🪛 YAMLlint (1.37.1)
deploy/operator.yaml
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
[warning] 49-49: wrong indentation: expected 10 but found 12
(indentation)
🔇 Additional comments (3)
pkg/bundle/deploy.go (3)
6237-6241
: Add health probe bind address CLI flag (good).Flag looks correct for controller-runtime; confirms the probe server will listen on :8081. Ensure the operator binary exposes this flag and uses it to start the health server.
6242-6244
: Expose named health port (good).Named port "healthz" matches the httpGet probe references; no issues.
6253-6261
: Readiness probe config looks sensible./readyz path, timing, and thresholds are reasonable defaults.
args: | ||
- operator | ||
- run | ||
- --health-probe-bind-address=:8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Trim trailing spaces and normalize args indentation.
Lines 45–46 have trailing spaces; fix to keep CI/yamllint green and avoid noisy diffs.
- args:
- - operator
- - run
- - --health-probe-bind-address=:8081
+ args:
+ - operator
+ - run
+ - --health-probe-bind-address=:8081
To verify no trailing spaces remain:
🏁 Script executed:
#!/usr/bin/env bash
grep -nE ' +$' deploy/operator.yaml || echo "No trailing spaces"
Length of output: 93
Trim trailing spaces in deploy/operator.yaml args
Lines 45–46 contain trailing spaces; remove them to keep CI/yamllint green.
🧰 Tools
🪛 YAMLlint (1.37.1)
[warning] 45-45: wrong indentation: expected 10 but found 12
(indentation)
[error] 45-45: trailing spaces
(trailing-spaces)
[error] 46-46: trailing spaces
(trailing-spaces)
🤖 Prompt for AI Agents
In deploy/operator.yaml around lines 44 to 47, the args entries ("operator" and
"run") include trailing spaces on lines 45–46; remove the trailing whitespace
characters from those two args so each arg is trimmed to exact tokens (e.g.,
"operator" and "run") to satisfy yamllint/CI and keep YAML clean.
httpGet: | ||
path: /readyz | ||
port: healthz | ||
initialDelaySeconds: 15 | ||
periodSeconds: 10 | ||
timeoutSeconds: 5 | ||
failureThreshold: 3 | ||
startupProbe: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Liveness probe should hit /healthz, not /readyz.
controller-runtime exposes both /healthz and /readyz; liveness is expected to target /healthz. Pointing liveness to /readyz can mask unhealthy-but-ready states. Switch the path to /healthz.
livenessProbe:
httpGet:
- path: /readyz
+ path: /healthz
port: healthz
initialDelaySeconds: 15
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
httpGet: | |
path: /readyz | |
port: healthz | |
initialDelaySeconds: 15 | |
periodSeconds: 10 | |
timeoutSeconds: 5 | |
failureThreshold: 3 | |
startupProbe: | |
httpGet: | |
path: /healthz | |
port: healthz | |
initialDelaySeconds: 15 | |
periodSeconds: 10 | |
timeoutSeconds: 5 | |
failureThreshold: 3 | |
startupProbe: |
🤖 Prompt for AI Agents
In pkg/bundle/deploy.go around lines 6262 to 6269 the livenessProbe httpGet path
is set to /readyz but should target /healthz; change the livenessProbe httpGet
path value from "/readyz" to "/healthz" so the liveness check uses the
controller-runtime health endpoint and does not mask unhealthy-but-ready states.
Signed-off-by: Oded Viner <[email protected]>
Tested locally on my machine:
Explain the changes
Issues: Fixed #xxx / Gap #xxx
Testing Instructions:
Summary by CodeRabbit
New Features
Chores