Skip to content

Latest commit

 

History

History
1293 lines (1042 loc) · 46.5 KB

user.md

File metadata and controls

1293 lines (1042 loc) · 46.5 KB

User Guide

Learn how to work with the Postgres Operator in a Kubernetes (K8s) environment.

Create a manifest for a new PostgreSQL cluster

Make sure you have set up the operator. Then you can create a new Postgres cluster by applying manifest like this minimal example:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
spec:
  teamId: "acid"
  volume:
    size: 1Gi
  numberOfInstances: 2
  users:
    # database owner
    zalando:
    - superuser
    - createdb

    # role for application foo
    foo_user: # or 'foo_user: []'

  #databases: name->owner
  databases:
    foo: zalando
  postgresql:
    version: "17"

Once you cloned the Postgres Operator repository you can find this example also in the manifests folder:

kubectl create -f manifests/minimal-postgres-manifest.yaml

Make sure, the spec section of the manifest contains at least a teamId, the numberOfInstances and the postgresql object with the version specified. The minimum volume size to run the postgresql resource on Elastic Block Storage (EBS) is 1Gi.

Note, that when enable_team_id_clustername_prefix is set to true the name of the cluster must start with the teamId and -. At Zalando we use team IDs (nicknames) to lower chances of duplicate cluster names and colliding entities. The team ID would also be used to query an API to get all members of a team and create database roles for them. Besides, the maximum cluster name length is 53 characters.

Watch pods being created

Check if the database pods are coming up. Use the label application=spilo to filter and list the label spilo-role to see when the master is promoted and replicas get their labels.

kubectl get pods -l application=spilo -L spilo-role -w

The operator also emits K8s events to the Postgresql CRD which can be inspected in the operator logs or with:

kubectl describe postgresql acid-minimal-cluster

Connect to PostgreSQL

With a port-forward on one of the database pods (e.g. the master) you can connect to the PostgreSQL database from your machine. Use labels to filter for the master pod of our test cluster.

# get name of master pod of acid-minimal-cluster
export PGMASTER=$(kubectl get pods -o jsonpath={.items..metadata.name} -l application=spilo,cluster-name=acid-minimal-cluster,spilo-role=master -n default)

# set up port forward
kubectl port-forward $PGMASTER 6432:5432 -n default

Open another CLI and connect to the database using e.g. the psql client. When connecting with a manifest role like foo_user user, read its password from the K8s secret which was generated when creating acid-minimal-cluster. As non-encrypted connections are rejected by default set SSL mode to require:

export PGPASSWORD=$(kubectl get secret postgres.acid-minimal-cluster.credentials.postgresql.acid.zalan.do -o 'jsonpath={.data.password}' | base64 -d)
export PGSSLMODE=require
psql -U postgres -h localhost -p 6432

Password encryption

Passwords are encrypted with md5 hash generation by default. However, it is possible to use the more recent scram-sha-256 method by changing the password_encryption parameter in the Postgres config. You can define it directly from the cluster manifest:

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
spec:
  [...]
  postgresql:
    version: "17"
    parameters:
      password_encryption: scram-sha-256

Defining database roles in the operator

Postgres Operator allows defining roles to be created in the resulting database cluster. It covers three use-cases:

  • manifest roles: create application roles specific to the cluster described in the manifest.
  • infrastructure roles: create application roles that should be automatically created on every cluster managed by the operator.
  • teams API roles: automatically create users for every member of the team owning the database cluster.

In the next sections, we will cover those use cases in more details. Note, that the Postgres Operator can also create databases with pre-defined owner, reader and writer roles which saves you the manual setup. Read more in the next chapter.

Manifest roles

Manifest roles are defined directly in the cluster manifest. See minimal postgres manifest for an example of zalando role, defined with superuser and createdb flags.

Manifest roles are defined as a dictionary, with a role name as a key and a list of role options as a value. For a role without any options it is best to supply the empty list []. It is also possible to leave this field empty as in our example manifests. In certain cases such empty field may be missing later removed by K8s due to the null value it gets (foobar_user: is equivalent to foobar_user: null).

The operator accepts the following options: superuser, inherit, login, nologin, createrole, createdb, replication, bypassrls.

By default, manifest roles are login roles (aka users), unless nologin is specified explicitly.

The operator automatically generates a password for each manifest role and places it in the secret named {username}.{clustername}.credentials.postgresql.acid.zalan.do in the same namespace as the cluster. This way, the application running in the K8s cluster and connecting to Postgres can obtain the password right from the secret, without ever sharing it outside of the cluster.

At the moment it is not possible to define membership of the manifest role in other roles.

To define the secrets for the users in a different namespace than that of the cluster, one can set enable_cross_namespace_secret and declare the namespace for the secrets in the manifest in the following manner (note, that it has to be reflected in the database section, too),

spec:
  users:
    # users with secret in different namespace
    appspace.db_user:
    - createdb
  databases:
    # namespace notation is part of user name
    app_db: appspace.db_user

Here, anything before the first dot is considered the namespace and the text after the first dot is the username. Also, the postgres roles of these usernames would be in the form of namespace.username.

For such usernames, the secret is created in the given namespace and its name is of the following form, {namespace}.{username}.{clustername}.credentials.postgresql.acid.zalan.do

Infrastructure roles

An infrastructure role is a role that should be present on every PostgreSQL cluster managed by the operator. An example of such a role is a monitoring user. There are two ways to define them:

  • With the infrastructure roles secret only
  • With both the the secret and the infrastructure role ConfigMap.

Infrastructure roles secret

Infrastructure roles can be specified by the infrastructure_roles_secrets parameter where you can reference multiple existing secrets. Prior to v1.6.0 the operator could only reference one secret with the infrastructure_roles_secret_name option. However, this secret could contain multiple roles using the same set of keys plus incrementing index.

apiVersion: v1
kind: Secret
metadata:
  name: postgresql-infrastructure-roles
data:
  user1: ZGJ1c2Vy
  password1: c2VjcmV0
  inrole1: b3BlcmF0b3I=
  user2: ...

The block above describes the infrastructure role 'dbuser' with password 'secret' that is a member of the 'operator' role. The resulting role will automatically be a login role.

With the new option users can configure the names of secret keys that contain the user name, password etc. The secret itself is referenced by the secretname key. If the secret uses a template for multiple roles as described above list them separately.

apiVersion: "acid.zalan.do/v1"
kind: OperatorConfiguration
metadata:
  name: postgresql-operator-configuration
configuration:
  kubernetes:
    infrastructure_roles_secrets:
    - secretname: "postgresql-infrastructure-roles"
      userkey: "user1"
      passwordkey: "password1"
      rolekey: "inrole1"
    - secretname: "postgresql-infrastructure-roles"
      userkey: "user2"
      ...

Note, only the CRD-based configuration allows for referencing multiple secrets. As of now, the ConfigMap is restricted to either one or the existing template option with infrastructure_roles_secret_name. Please, refer to the example manifests to understand how infrastructure_roles_secrets has to be configured for the configmap or CRD configuration.

If both infrastructure_roles_secret_name and infrastructure_roles_secrets are defined the operator will create roles for both of them. So make sure, they do not collide. Note also, that with definitions that solely use the infrastructure roles secret there is no way to specify role options (like superuser or nologin) or role memberships. This is where the additional ConfigMap comes into play.

Secret plus ConfigMap

A ConfigMap allows for defining more details regarding the infrastructure roles. Therefore, one should use the new style that specifies infrastructure roles using both the secret and a ConfigMap. The ConfigMap must have the same name as the secret. The secret should contain an entry with 'rolename:rolepassword' for each role.

dbuser: c2VjcmV0

And the role description for that user should be specified in the ConfigMap.

data:
  dbuser: |
    inrole: [operator, admin]  # following roles will be assigned to the new user
    user_flags:
    - createdb
    db_parameters:  # db parameters, applied for this particular user
      log_statement: all

One can allow membership in multiple roles via the inrole array parameter, define role flags via the user_flags list and supply per-role options through the db_parameters dictionary. All those parameters are optional.

Both definitions can be mixed in the infrastructure role secret, as long as your new-style definition can be clearly distinguished from the old-style one (for instance, do not name new-style roles userN).

Since an infrastructure role is created uniformly on all clusters managed by the operator, it makes no sense to define it without the password. Such definitions will be ignored with a prior warning.

See infrastructure roles secret and infrastructure roles configmap for the examples.

Teams API roles

These roles are meant for database activity of human users. It's possible to configure the operator to automatically create database roles for lets say all employees of one team. They are not listed in the manifest and there are no K8s secrets created for them. Instead they would use an OAuth2 token to connect. To get all members of the team the operator queries a defined API endpoint that returns usernames. A minimal Teams API should work like this:

/.../<teamname> -> ["name","anothername"]

A "fake" Teams API deployment is provided in the manifests folder to set up a basic API around whatever services is used for user management. The Teams API's URL is set in the operator's configuration and enable_teams_api must be set to true. There are more settings available to choose superusers, group roles, PAM configuration etc. An OAuth2 token can be passed to the Teams API via a secret. The name for this secret is configurable with the oauth_token_secret_name parameter.

Additional teams and members per cluster

Postgres clusters are associated with one team by providing the teamID in the manifest. Additional superuser teams can be configured as mentioned in the previous paragraph. However, this is a global setting. To assign additional teams, superuser teams and single users to clusters of a given team, use the PostgresTeam CRD.

Note, by default the PostgresTeam support is disabled in the configuration. Switch enable_postgres_team_crd flag to true and the operator will start to watch for this CRD. Make sure, the cluster role is up to date and contains a section for PostgresTeam.

Additional teams

To assign additional teams and single users to clusters of a given team, define a mapping with the PostgresTeam Kubernetes resource. The Postgres Operator will read such team mappings each time it syncs all Postgres clusters.

apiVersion: "acid.zalan.do/v1"
kind: PostgresTeam
metadata:
  name: custom-team-membership
spec:
  additionalTeams:
    a-team:
    - "b-team"

With the example above the operator will create login roles for all members of b-team in every cluster owned by a-team. It's possible to do vice versa for clusters of b-team in one manifest:

spec:
  additionalTeams:
    a-team:
    - "b-team"
    b-team:
    - "a-team"

You see, the PostgresTeam CRD is a global team mapping and independent from the Postgres manifests. It is possible to define multiple mappings, even with redundant content - the Postgres operator will create one internal cache from it. Additional teams are resolved transitively, meaning you will also add users for their additionalTeams, e.g.:

spec:
  additionalTeams:
    a-team:
    - "b-team"
    - "c-team"
    b-team:
    - "a-team"

This creates roles for members of the c-team team not only in all clusters owned by a-team, but as well in cluster owned by b-team, as a-team is an additionalTeam to b-team

Not, you can also define additionalSuperuserTeams in the PostgresTeam manifest. By default, this option is disabled and must be configured with enable_postgres_team_crd_superusers to make it work.

Virtual teams

There can be "virtual teams" that do not exist in the Teams API. It can make it easier to map a group of teams to many other teams:

spec:
  additionalTeams:
    a-team:
    - "virtual-team"
    b-team:
    - "virtual-team"
    virtual-team:
    - "c-team"
    - "d-team"

This example would create roles for members of c-team and d-team plus additional virtual-team members in clusters owned by a-team or b-team.

Teams changing their names

With PostgresTeams it is also easy to cover team name changes. Just add the mapping between old and new team name and the rest can stay the same. E.g. if team a-team's name would change to f-team in the teams API it could be reflected in a PostgresTeam mapping with just two lines:

spec:
  additionalTeams:
    a-team:
    - "f-team"

This is helpful, because Postgres cluster names are immutable and can not be changed. Only via cloning it could get a different name starting with the new teamID.

Additional members

Single members might be excluded from teams although they continue to work with the same people. However, the teams API would not reflect this anymore. To still add a database role for former team members list their role under the additionalMembers section of the PostgresTeam resource:

apiVersion: "acid.zalan.do/v1"
kind: PostgresTeam
metadata:
  name: custom-team-membership
spec:
  additionalMembers:
    a-team:
    - "tia"

This will create the login role tia in every cluster owned by a-team. The user can connect to databases like the other team members.

The additionalMembers map can also be used to define users of virtual teams, e.g. for virtual-team we used above:

spec:
  additionalMembers:
    virtual-team:
    - "flynch"
    - "rdecker"
    - "briggs"

Removed members

The Postgres Operator does not delete database roles when users are removed from manifests. But, using the PostgresTeam custom resource or Teams API it is very easy to add roles to many clusters. Manually reverting such a change is cumbersome. Therefore, if members are removed from a PostgresTeam or the Teams API the operator can rename roles appending a configured suffix to the name (see role_deletion_suffix option) and revoke the LOGIN privilege. The suffix makes it easy then for a cleanup script to remove those deprecated roles completely. Switch enable_team_member_deprecation to true to enable this behavior.

When a role is re-added to a PostgresTeam manifest (or to the source behind the Teams API) the operator will check for roles with the configured suffix and if found, rename the role back to the original name and grant LOGIN again.

Prepared databases with roles and default privileges

The users section in the manifests only allows for creating database roles with global privileges. Fine-grained data access control or role membership can not be defined and must be set up by the user in the database. But, the Postgres Operator offers a separate section to specify preparedDatabases that will be created with pre-defined owner, reader and writer roles for each individual database and, optionally, for each database schema, too. preparedDatabases also enable users to specify PostgreSQL extensions that shall be created in a given database schema.

Default database and schema

A prepared database is already created by adding an empty preparedDatabases section to the manifest. The database will then be called like the Postgres cluster manifest (- are replaced with _) and will also contain a schema called data.

spec:
  preparedDatabases: {}

Default NOLOGIN roles

Given an example with a specified database and schema:

spec:
  preparedDatabases:
    foo:
      schemas:
        bar: {}

Postgres Operator will create the following NOLOGIN roles:

Role name Member of Admin
foo_owner admin
foo_reader foo_owner
foo_writer foo_reader foo_owner
foo_bar_owner foo_owner
foo_bar_reader foo_bar_owner
foo_bar_writer foo_bar_reader foo_bar_owner

The <dbname>_owner role is the database owner and should be used when creating new database objects. All members of the admin role, e.g. teams API roles, can become the owner with the SET ROLE command. Default privileges are configured for the owner role so that the <dbname>_reader role automatically gets read-access (SELECT) to new tables and sequences and the <dbname>_writer receives write-access (INSERT, UPDATE, DELETE on tables, USAGE and UPDATE on sequences). Both get USAGE on types and EXECUTE on functions.

The same principle applies for database schemas which are owned by the <dbname>_<schema>_owner role. <dbname>_<schema>_reader is read-only, <dbname>_<schema>_writer has write access and inherit reading from the reader role. Note, that the <dbname>_* roles have access incl. default privileges on all schemas, too. If you don't need the dedicated schema roles - i.e. you only use one schema - you can disable the creation like this:

spec:
  preparedDatabases:
    foo:
      schemas:
        bar:
          defaultRoles: false

Then, the schemas are owned by the database owner, too.

Default LOGIN roles

The roles described in the previous paragraph can be granted to LOGIN roles from the users section in the manifest. Optionally, the Postgres Operator can also create default LOGIN roles for the database and each schema individually. These roles will get the _user suffix and they inherit all rights from their NOLOGIN counterparts. Therefore, you cannot have defaultRoles set to false and enable defaultUsers at the same time.

Role name Member of Admin
foo_owner_user foo_owner admin
foo_reader_user foo_reader foo_owner
foo_writer_user foo_writer foo_owner
foo_bar_owner_user foo_bar_owner foo_owner
foo_bar_reader_user foo_bar_reader foo_bar_owner
foo_bar_writer_user foo_bar_writer foo_bar_owner

These default users are enabled in the manifest with the defaultUsers flag:

spec:
  preparedDatabases:
    foo:
      defaultUsers: true
      schemas:
        bar:
          defaultUsers: true

Default access privileges are also defined for LOGIN roles on database and schema creation. This means they are currently not set when defaultUsers (or defaultRoles for schemas) are enabled at a later point in time.

For all LOGIN roles the operator will create K8s secrets in the namespace specified in secretNamespace, if enable_cross_namespace_secret is set to true in the config. Otherwise, they are created in the same namespace like the Postgres cluster. Unlike roles specified with namespace.username under users, the namespace will not be part of the role name here. Keep in mind that the underscores in a role name are replaced with dashes in the K8s secret name.

spec:
  preparedDatabases:
    foo:
      defaultUsers: true
      secretNamespace: appspace

Schema search_path for default roles

The schema search_path for each role will include the role name and the schemas, this role should have access to. So foo_bar_writer does not have to schema-qualify tables from schemas foo_bar_writer, bar, while foo_writer can look up foo_writer and any schema listed under schemas. To register the default public schema in the search_path (because some extensions are installed there) one has to add the following (assuming no extra roles are desired only for the public schema):

spec:
  preparedDatabases:
    foo:
      schemas:
        public:
          defaultRoles: false

Database extensions

Prepared databases also allow for creating Postgres extensions. They will be created by the database owner in the specified schema.

spec:
  preparedDatabases:
    foo:
      extensions:
        pg_partman: public
        postgis: data

Some extensions require SUPERUSER rights on creation unless they are not allowed by the pgextwlist extension, that is shipped with the Spilo image. To see which extensions are on the list check the extwlist.extension parameter in the postgresql.conf file.

SHOW extwlist.extensions;

Make sure that pgextlist is also listed under shared_preload_libraries in the PostgreSQL configuration. Then the database owner should be able to create the extension specified in the manifest.

From databases to preparedDatabases

If you wish to create the role setup described above for databases listed under the databases key, you have to make sure that the owner role follows the <dbname>_owner naming convention of preparedDatabases. As roles are synced first, this can be done with one edit:

# before
spec:
  databases:
    foo: db_owner

# after
spec:
  databases:
    foo: foo_owner
  preparedDatabases:
    foo:
      schemas:
        my_existing_schema: {}

Adding existing database schemas to the manifest to create roles for them as well is up the user and not done by the operator. Remember that if you don't specify any schema a new database schema called data will be created. When everything got synced (roles, schemas, extensions), you are free to remove the database from the databases section. Note, that the operator does not delete database objects or revoke privileges when removed from the manifest.

Resource definition

The compute resources to be used for the Postgres containers in the pods can be specified in the postgresql cluster manifest.

spec:
  resources:
    requests:
      cpu: 10m
      memory: 100Mi
    limits:
      cpu: 300m
      memory: 300Mi

The minimum limits to properly run the postgresql resource are configured to 250m for cpu and 250Mi for memory. If a lower value is set in the manifest the operator will raise the limits to the configured minimum values. If no resources are defined in the manifest they will be obtained from the configured default requests. If neither defaults nor minimum limits are configured the operator will not specify any resources and it's up to K8s (or your own) admission hooks to handle it.

HugePages support

The operator supports HugePages. To enable HugePages, set the matching resource requests and/or limits in the manifest:

spec:
  resources:
    requests:
      hugepages-2Mi: 250Mi
      hugepages-1Gi: 1Gi
    limits:
      hugepages-2Mi: 500Mi
      hugepages-1Gi: 2Gi

There are no minimums or maximums and the default is 0 for both HugePage sizes, but Kubernetes will not spin up the pod if the requested HugePages cannot be allocated. For more information on HugePages in Kubernetes, see also https://kubernetes.io/docs/tasks/manage-hugepages/scheduling-hugepages/

Use taints, tolerations and node affinity for dedicated PostgreSQL nodes

To ensure Postgres pods are running on nodes without any other application pods, you can use taints and tolerations and configure the required toleration in the manifest. Tolerations can also be defined in the operator config to apply for all Postgres clusters.

spec:
  tolerations:
  - key: postgres
    operator: Exists
    effect: NoSchedule

If you need the pods to be scheduled on specific nodes you may use node affinity to specify a set of label(s), of which a prospective host node must have at least one. This could be used to place nodes with certain hardware capabilities (e.g. SSD drives) in certain environments or network segments, e.g. for PCI compliance.

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster
spec:
  teamId: "ACID"
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: environment
          operator: In
          values:
          - pci

If you need to define a nodeAffinity for all your Postgres clusters use the node_readiness_label configuration.

In-place major version upgrade

Starting with Spilo 13, operator supports in-place major version upgrade to a higher major version (e.g. from PG 14 to PG 16). To trigger the upgrade, simply increase the version in the manifest. It is your responsibility to test your applications against the new version before the upgrade; downgrading is not supported. The easiest way to do so is to try the upgrade on the cloned cluster first (see next chapter). More details can be found in the admin docs.

How to clone an existing PostgreSQL cluster

You can spin up a new cluster as a clone of the existing one, using a clone section in the spec. There are two options here:

  • Clone from an S3 bucket (recommended)
  • Clone directly from a source cluster

Note, that cloning can also be used for major version upgrades of PostgreSQL.

Clone from S3

Cloning from S3 has the advantage that there is no impact on your production database. A new Postgres cluster is created by restoring the data of another source cluster. If you create it in the same Kubernetes environment, use a different name.

apiVersion: "acid.zalan.do/v1"
kind: postgresql
metadata:
  name: acid-minimal-cluster-clone
spec:
  clone:
    uid: "efd12e58-5786-11e8-b5a7-06148230260c"
    cluster: "acid-minimal-cluster"
    timestamp: "2017-12-19T12:40:33+01:00"

Here cluster is a name of a source cluster that is going to be cloned. A new cluster will be cloned from S3, using the latest backup before the timestamp. Note, a time zone is required for timestamp in the format of +00:00 (UTC).

The operator will try to find the WAL location based on the configured wal_[s3|gs]_bucket or wal_az_storage_account and the specified uid. You can find the UID of the source cluster in its metadata:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: acid-minimal-cluster
  uid: efd12e58-5786-11e8-b5a7-06148230260c

If your source cluster uses a WAL location different from the global configuration you can specify the full path under s3_wal_path. For Google Cloud Platform or Azure it can only be set globally with custom Pod environment variables or locally in the Postgres manifest's env section.

For non AWS S3 following settings can be set to support cloning from other S3 implementations:

spec:
  clone:
    uid: "efd12e58-5786-11e8-b5a7-06148230260c"
    cluster: "acid-minimal-cluster"
    timestamp: "2017-12-19T12:40:33+01:00"
    s3_wal_path: "s3://custom/path/to/bucket"
    s3_endpoint: https://s3.acme.org
    s3_access_key_id: 0123456789abcdef0123456789abcdef
    s3_secret_access_key: 0123456789abcdef0123456789abcdef
    s3_force_path_style: true

Clone directly

Another way to get a fresh copy of your source DB cluster is via pg_basebackup. To use this feature simply leave out the timestamp field from the clone section. The operator will connect to the service of the source cluster by name. If the cluster is called test, then the connection string will look like host=test port=5432), which means that you can clone only from clusters within the same namespace.

spec:
  clone:
    cluster: "acid-minimal-cluster"

Be aware that on a busy source database this can result in an elevated load!

Restore in place

There is also a possibility to restore a database without cloning it. The advantage to this is that there is no need to change anything on the application side. However, as it involves deleting the database first, this process is of course riskier than cloning (which involves adjusting the connection parameters of the app).

First, make sure there is no writing activity on your DB, and save the UID. Then delete the postgresql K8S resource:

zkubectl delete postgresql acid-test-restore

Then deploy a new manifest with the same name, referring to itself (both name and UID) in the clone section:

metadata:
  name: acid-minimal-cluster
  # [...]
spec:
  # [...]
  clone:
    cluster: "acid-minimal-cluster"  # the same as metadata.name above!
    uid: "<original_UID>"
    timestamp: "2022-04-01T10:11:12.000+00:00"

This will create a new database cluster with the same name but different UID, whereas the database will be in the state it was at the specified time.

⚠️ The backups and WAL files for the original DB are retained under the original UID, making it possible retry restoring. However, it is probably better to create a temporary clone for experimenting or finding out to which point you should restore.

Setting up a standby cluster

Standby cluster is a Patroni feature that first clones a database, and keeps replicating changes afterwards. It can exist in a different location than its source database, but unlike cloning, the PostgreSQL version between source and target cluster has to be the same.

To start a cluster as standby, add the following standby section in the YAML file. You can stream changes from archived WAL files (AWS S3 or Google Cloud Storage) or from a remote primary. Only one option can be specfied in the manifest:

spec:
  standby:
    s3_wal_path: "s3://<bucketname>/spilo/<source_db_cluster>/<UID>/wal/<PGVERSION>"

For GCS, you have to define STANDBY_GOOGLE_APPLICATION_CREDENTIALS as a custom pod environment variable. It is not set from the config to allow for overridding.

spec:
  standby:
    gs_wal_path: "gs://<bucketname>/spilo/<source_db_cluster>/<UID>/wal/<PGVERSION>"

For a remote primary you specify the host address and optionally the port. If you leave out the port Patroni will use "5432".

spec:
  standby:
    standby_host: "acid-minimal-cluster.default"
    standby_port: "5433"

Note, that the pods and services use the same role labels like for normal clusters: The standby leader is labeled as master. When using the standby_host option you have to copy the credentials from the source cluster's secrets to successfully bootstrap a standby cluster (see next chapter).

Providing credentials of source cluster

A standby cluster is replicating the data (including users and passwords) from the source database and is read-only. The system and application users (like standby, postgres etc.) all have a password that does not match the credentials stored in secrets which are created by the operator. You have two options:

a. Create secrets manually beforehand and paste the credentials of the source cluster b. Let the operator create the secrets when it bootstraps the standby cluster. Patch the secrets with the credentials of the source cluster. Replace the spilo pods.

Otherwise, you will see errors in the Postgres logs saying users cannot log in and the operator logs will complain about not being able to sync resources. If you stream changes from a remote primary you have to align the secrets or the standby cluster will not start up.

If you stream changes from WAL files and you only run a standby leader, you can safely ignore the secret mismatch, as it will be sorted out once the cluster is detached from the source. It is also harmless if you do not plan it. But, when you create a standby replica, too, fix the credentials right away. WAL files will pile up on the standby leader if no connection can be established between standby replica(s).

Promote the standby

One big advantage of standby clusters is that they can be promoted to a proper database cluster. This means it will stop replicating changes from the source, and start accept writes itself. This mechanism makes it possible to move databases from one place to another with minimal downtime.

Before promoting a standby cluster, make sure that the standby is not behind the source database. You should ideally stop writes to your source cluster and then create a dummy database object that you check for being replicated in the target to verify all data has been copied.

To promote, remove the standby section from the postgres cluster manifest. A rolling update will be triggered removing the STANDBY_* environment variables from the pods, followed by a Patroni config update that promotes the cluster.

Adding standby section after promotion

Turning a running cluster into a standby is not easily possible and should be avoided. The best way is to remove the cluster and resubmit the manifest after a short wait of a few minutes. Adding the standby section would turn the database cluster in read-only mode on next operator SYNC cycle but it does not sync automatically with the source cluster again.

Sidecar Support

Each cluster can specify arbitrary sidecars to run. These containers could be used for log aggregation, monitoring, backups or other tasks. A sidecar can be specified like this:

spec:
  sidecars:
    - name: "container-name"
      image: "company/image:tag"
      resources:
        limits:
          cpu: 500m
          memory: 500Mi
        requests:
          cpu: 100m
          memory: 100Mi
      env:
        - name: "ENV_VAR_NAME"
          value: "any-k8s-env-things"
      command: ['sh', '-c', 'echo "logging" > /opt/logs.txt']

In addition to any environment variables you specify, the following environment variables are always passed to sidecars:

  • POD_NAME - field reference to metadata.name
  • POD_NAMESPACE - field reference to metadata.namespace
  • POSTGRES_USER - the superuser that can be used to connect to the database
  • POSTGRES_PASSWORD - the password for the superuser

The PostgreSQL volume is shared with sidecars and is mounted at /home/postgres/pgdata.

Note: The operator will not create a cluster if sidecar containers are specified but globally disabled in the configuration. The enable_sidecars option must be set to true.

If you want to add a sidecar to every cluster managed by the operator, you can specify it in the operator configuration instead.

Accessing the PostgreSQL socket from sidecars

If enabled by the share_pgsocket_with_sidecars option in the operator configuration the PostgreSQL socket is placed in a volume of type emptyDir named postgresql-run. To allow access to the socket from any sidecar container simply add a VolumeMount to this volume to your sidecar spec.

  - name: "container-name"
    image: "company/image:tag"
    volumeMounts:
    - mountPath: /var/run
      name: postgresql-run

If you do not want to globally enable this feature and only use it for single Postgres clusters, specify an EmptyDir volume under additionalVolumes in the manifest:

spec:
  additionalVolumes:
  - name: postgresql-run
    mountPath: /var/run/postgresql
    targetContainers:
    - all
    volumeSource:
      emptyDir: {}
  sidecars: 
  - name: "container-name"
    image: "company/image:tag"
    volumeMounts:
    - mountPath: /var/run
      name: postgresql-run

InitContainers Support

Each cluster can specify arbitrary init containers to run. These containers can be used to run custom actions before any normal and sidecar containers start. An init container can be specified like this:

spec:
  initContainers:
    - name: "container-name"
      image: "company/image:tag"
      env:
        - name: "ENV_VAR_NAME"
          value: "any-k8s-env-things"

initContainers accepts full v1.Container definition.

Note: The operator will not create a cluster if initContainers are specified but globally disabled in the configuration. The enable_init_containers option must be set to true.

Increase volume size

Postgres operator supports statefulset volume resize without doing a rolling update. For that you need to change the size field of the volume description in the cluster manifest and apply the change:

spec:
  volume:
    size: 5Gi # new volume size

The operator compares the new value of the size field with the previous one and acts on differences. The storage_resize_mode can be configured. By default, the operator will adjust the PVCs and leave it to K8s and the infrastructure to apply the change.

When using AWS with gp3 volumes you should set the mode to mixed because it will also adjust the IOPS and throughput that can be defined in the manifest. Check the AWS docs to learn about default and maximum values. Keep in mind that AWS rate-limits updating volume specs to no more than once every 6 hours.

spec:
  volume:
    size: 5Gi # new volume size
    iops: 4000
    throughput: 500

The operator can only enlarge volumes. Shrinking is not supported and will emit a warning. However, it can be done manually after updating the manifest. You have to delete the PVC, which will hang until you also delete the corresponding pod. Proceed with the next pod when the cluster is healthy again and replicas are streaming.

Logical backups

You can enable logical backups (SQL dumps) from the cluster manifest by adding the following parameter in the spec section:

spec:
  enableLogicalBackup: true

The operator will create and sync a K8s cron job to do periodic logical backups of this particular Postgres cluster. Due to the limitation of K8s cron jobs it is highly advisable to set up additional monitoring for this feature; such monitoring is outside the scope of operator responsibilities. See configuration reference and administrator documentation for details on how backups are executed.

Connection pooler

The operator can create a database side connection pooler for those applications where an application side pooler is not feasible, but a number of connections is high. To create a connection pooler together with a database, modify the manifest:

spec:
  enableConnectionPooler: true
  enableReplicaConnectionPooler: true

This will tell the operator to create a connection pooler with default configuration, through which one can access the master via a separate service {cluster-name}-pooler. With the first option, connection pooler for master service is created and with the second option, connection pooler for replica is created. Note that both of these flags are independent of each other and user can set or unset any of them as per their requirements without any effect on the other.

In most of the cases the default configuration should be good enough. To configure a new connection pooler individually for each Postgres cluster, specify:

spec:
  connectionPooler:
    # how many instances of connection pooler to create
    numberOfInstances: 2

    # in which mode to run, session or transaction
    mode: "transaction"

    # schema, which operator will create in each database
    # to install credentials lookup function for connection pooler
    schema: "pooler"

    # user, which operator will create for connection pooler
    user: "pooler"

    # resources for each instance
    resources:
      requests:
        cpu: 500m
        memory: 100Mi
      limits:
        cpu: "1"
        memory: 100Mi

The enableConnectionPooler flag is not required when the connectionPooler section is present in the manifest. But, it can be used to disable/remove the pooler while keeping its configuration.

By default, PgBouncer is used as connection pooler. To find out about pool modes read the PgBouncer docs (but it should be the general approach between different implementation).

Note, that using PgBouncer a meaningful resource CPU limit should be 1 core or less (there is a way to utilize more than one, but in K8s it's easier just to spin up more instances).

Custom TLS certificates

By default, the Spilo image generates its own TLS certificate during startup. However, this certificate cannot be verified and thus doesn't protect from active MITM attacks. In this section we show how to specify a custom TLS certificate which is mounted in the database pods via a K8s Secret.

Before applying these changes, in k8s the operator must also be configured with the spilo_fsgroup set to the GID matching the postgres user group. If you don't know the value, use 103 which is the GID from the default Spilo image (spilo_fsgroup=103 in the cluster request spec).

OpenShift allocates the users and groups dynamically (based on scc), and their range is different in every namespace. Due to this dynamic behaviour, it's not trivial to know at deploy time the uid/gid of the user in the cluster. Therefore, instead of using a global spilo_fsgroup setting in operator configuration or use the spiloFSGroup field per Postgres cluster manifest.

For testing purposes, you can generate a self-signed certificate with openssl:

openssl req -x509 -nodes -newkey rsa:2048 -keyout tls.key -out tls.crt -subj "/CN=acid.zalan.do"

Upload the cert as a kubernetes secret:

kubectl create secret tls pg-tls \
  --key tls.key \
  --cert tls.crt

When doing client auth, CA can come optionally from the same secret:

kubectl create secret generic pg-tls \
  --from-file=tls.crt=server.crt \
  --from-file=tls.key=server.key \
  --from-file=ca.crt=ca.crt

Then configure the postgres resource with the TLS secret:

apiVersion: "acid.zalan.do/v1"
kind: postgresql

metadata:
  name: acid-test-cluster
spec:
  tls:
    secretName: "pg-tls"
    caFile: "ca.crt" # add this if the secret is configured with a CA

Optionally, the CA can be provided by a different secret:

kubectl create secret generic pg-tls-ca --from-file=ca.crt=ca.crt

Then configure the postgres resource with the TLS secret:

apiVersion: "acid.zalan.do/v1"
kind: postgresql

metadata:
  name: acid-test-cluster
spec:
  tls:
    secretName: "pg-tls"    # this should hold tls.key and tls.crt
    caSecretName: "pg-tls-ca" # this should hold ca.crt
    caFile: "ca.crt" # add this if the secret is configured with a CA

Alternatively, it is also possible to use cert-manager to generate these secrets.

Certificate rotation is handled in the Spilo image which checks every 5 minutes if the certificates have changed and reloads postgres accordingly.

TLS certificates for connection pooler

By default, the pgBouncer image generates its own TLS certificate like Spilo. When the tls section is specfied in the manifest it will be used for the connection pooler pod(s) as well. The security context options are hard coded to runAsUser: 100 and runAsGroup: 101. The fsGroup will be the same like for Spilo.

As of now, the operator does not sync the pooler deployment automatically which means that changes in the pod template are not caught. You need to toggle enableConnectionPooler to set environment variables, volumes, secret mounts and securityContext required for TLS support in the pooler pod.