Skip to content

Commit

Permalink
docs: clarify readiness probe for replicas (cloudnative-pg#6629)
Browse files Browse the repository at this point in the history
Closes cloudnative-pg#6628

Signed-off-by: Gabriele Bartolini <[email protected]>
Signed-off-by: Francesco Canovai <[email protected]>
Co-authored-by: Jaime Silvela <[email protected]>
Co-authored-by: Francesco Canovai <[email protected]>
  • Loading branch information
3 people authored Jan 21, 2025
1 parent 404f0ac commit 721233c
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 14 deletions.
43 changes: 31 additions & 12 deletions docs/src/instance_manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,27 @@ main container, which in turn runs the PostgreSQL instance. During the lifetime
of the Pod, the instance manager acts as a backend to handle the
[startup, liveness and readiness probes](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#container-probes).

## Startup, liveness and readiness probes
## Startup, Liveness, and Readiness Probes

The startup and liveness probes rely on `pg_isready`, while the readiness
probe checks if the database is up and able to accept connections.
CloudNativePG leverages [PostgreSQL's `pg_isready`](https://www.postgresql.org/docs/current/app-pg-isready.html)
to implement Kubernetes startup, liveness, and readiness probes.

### Startup Probe

The `.spec.startDelay` parameter specifies the delay (in seconds) before the
liveness probe activates after a PostgreSQL Pod starts. By default, this is set
to `3600` seconds. You should adjust this value based on the time PostgreSQL
requires to fully initialize in your environment.
The startup probe ensures that a PostgreSQL instance, whether a primary or
standby, has fully started according to `pg_isready`.
While the startup probe is running, the liveness and readiness probes remain
disabled. Following Kubernetes standards, if the startup probe fails, the
kubelet will terminate the container, which will then be restarted.

The startup probe provided by CloudNativePG is configurable via the
parameter `.spec.startDelay`, which specifies the maximum time, in seconds,
allowed for the startup probe to succeed. At a minimum, the probe requires
`pg_isready` to return `0` or `1`.

By default, the `startDelay` is set to `3600` seconds. It is recommended to
adjust this setting based on the time PostgreSQL needs to fully initialize in
your specific environment.

!!! Warning
Setting `.spec.startDelay` too low can cause the liveness probe to activate
Expand Down Expand Up @@ -71,9 +81,14 @@ spec:

### Liveness Probe

The liveness probe begins after the startup probe succeeds and is responsible
for detecting if the PostgreSQL instance has entered a broken state that
requires a restart of the pod.
The liveness probe begins after the startup probe successfully completes. Its
primary role is to ensure the PostgreSQL instance—whether primary or standby—is
operating correctly. This is achieved using the `pg_isready` utility. Both exit
codes `0` (indicating the server is accepting connections) and `1` (indicating
the server is rejecting connections, such as during startup or a smart
shutdown) are treated as valid outcomes.
Following Kubernetes standards, if the liveness probe fails, the
kubelet will terminate the container, which will then be restarted.

The amount of time before a Pod is classified as not alive is configurable via
the `.spec.livenessProbeTimeout` parameter.
Expand Down Expand Up @@ -123,8 +138,12 @@ spec:

### Readiness Probe

The readiness probe determines when a pod running a PostgreSQL instance is
prepared to accept traffic and serve requests.
The readiness probe begins once the startup probe has successfully completed.
Its purpose is to check whether the PostgreSQL instance is ready to accept
traffic and serve requests.
For streaming replicas, it also requires that they have connected to the source
at least once. Following Kubernetes standards, if the readiness probe fails,
the pod will be marked unready and will not receive traffic from any services.

CloudNativePG uses the following default configuration for the readiness probe:

Expand Down
10 changes: 8 additions & 2 deletions docs/src/replication.md
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ spec:
```
ANY 1 ("foo-2","foo-3","foo-1")
```

At this point no write operations will be allowed until at least one of the
standbys is available again.

Expand All @@ -390,6 +390,12 @@ attempt to replicate WAL records to the designated number of synchronous
standbys, but write operations will continue even if fewer than the requested
number of standbys are available.

!!! Important
Make sure you have a clear understanding of what *ready/available* means
for a replica and set your expectations accordingly. By default, a replica is
considered ready when it has successfully connected to the source at least
once.

This setting balances data safety with availability, enabling applications to
continue writing during temporary standby unavailability—hence, it’s also known
as *self-healing mode*.
Expand Down Expand Up @@ -485,7 +491,7 @@ ANY q (pod1, pod2, ...)

Where:

- `q` is an integer automatically calculated by the operator to be:
- `q` is an integer automatically calculated by the operator to be:
`1 <= minSyncReplicas <= q <= maxSyncReplicas <= readyReplicas`
- `pod1, pod2, ...` is the list of all PostgreSQL pods in the cluster

Expand Down

0 comments on commit 721233c

Please sign in to comment.