Jump to content

Data Platform/Systems/CloudnativePG/Clusters

From Wikitech

Using the CloudnativePG operator, it becomes quite easy to deploy PostgreSQL clusters in Kubernetes.

We have created a custom cloudnative-pg-cluster chart, allowing users to create a cluster from a very simple YAML values file, such as this one.

fullnameOverride: postgres-airflow-analytics

By default, each cluster will upload base backups and WALs to a dedicated S3 bucket, and performs a daily backup at midnight UTC.

Observability

Dashboard

All cloudnative-pg clusters are instrumented using Prometheus and can be observed via this Grafana dashboard.

Monitoring

WIP

Operations

Performing a manual backup

[1] On the deployment server, run the following commands.

brouberol@deploy1003:~$ cat <<EOF >manual-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
  name: postgresql-test-on-demand-backup-$(date "+%Y%m%d%H%M%S")
spec:
  method: barmanObjectStore
  cluster:
    name: postgresql-test
EOF
brouberol@deploy1003:/home/brouberol# kubectl create -f manual-backup.yaml -n postgresql-test
backup.postgresql.cnpg.io/postgresql-test-on-demand-backup-20240917141528 created

You can then check the state of the backup with kubectl.

brouberol@deploy1003:/home/brouberol# kubectl get backups -n postgresql-test  --sort-by='{metadata.creationTimestamp}' --no-headers | tac
postgresql-test-on-demand-backup-20240917141528   72s   postgresql-test   barmanObjectStore   completed
postgresql-test-daily-backup-20240917000000       14h   postgresql-test   barmanObjectStore   completed
postgresql-test-daily-backup-20240916073251       30h   postgresql-test   barmanObjectStore   completed

Restoring a backup by replaying all WAL files

[2] The easiest way to restore the latest archived state of a pre-existing database is to deploy it with the following values:

mode: recovery
recovery:
  method: object_store
  clusterName: postgresql-test
backups:
  enabled: false

When deploying the cluster, this will start a full-recovery pod, that will replay all WAL files, after which the PG pods themselves will be deployed. After everything is deployed successfully, redeploy the cluster with all ^ these ^ values either removed or commented out. This will make sure to start the cluster in a normal mode and re-enable backups (and a backup will be immediately triggered for good measure).

Performing a Point In Time Recovery (PITR)

[2] A PITR allows you to restore the state as backed up at a certain point in time.

You can list the available base backups by descending dates by running

root@deploy1003:~# kubectl get backups -n postgresql-test  --sort-by='{metadata.creationTimestamp}' --no-headers | tac
postgresql-test-daily-backup-20240917000000   13h   postgresql-test   barmanObjectStore   completed
postgresql-test-daily-backup-20240916073251   30h   postgresql-test   barmanObjectStore   completed

Say we'd like to restore postgresql-test-daily-backup-20240917000000 . To do this, first extract the creation timestamp of the backup resource:

root@deploy1003:~# kubectl get backups -n postgresql-test  -o jsonpath='{.metadata.creationTimestamp}{"\n"}' postgresql-test-daily-backup-20240917000000
2024-09-17T00:00:00Z

You then need to convert this timestamp to another format, understood by the https://pgbarman.org/ tools, used to restore the backup.

$ python3 -c'import datetime as dt;print(dt.datetime.fromisoformat("2024-09-17T00:00:00Z").strftime("%Y-%m-%d %H:%M:%S.%f%z"))'
2024-09-17 00:00:00.000000+0000
Make sure to run this oneliner in python > 3.9

You can then deploy the cluster with the following values:

mode: recovery
recovery:
  method: object_store
  clusterName: postgresql-test
  recoveryTarget:
    targetTime: "2024-09-17 00:00:00.000000+0000"
backups:
  enabled: false

Once the cluster is deployed and running, redeploy it with all ^ these ^ values commented out or deleted. This will make sure to start the cluster in a normal mode and re-enable backups (and a backup will be immediately triggered for good measure).

If however, you want to perform a PITR to a timestamp that does not align with when a backup was performed, you can simply specify the timestamp to which you'd like to database to be restored, and the operator will select the closest backup that was completed before that target and will then restore that backup and replay the WAL until it reaches the specified timestamp.

Increasing the storage size of a cluster

[3] As we're storing the data on Ceph, we can resize the volume in which PG stores its data on the fly. To do this, simply adjust thecluster.storage.size cluster configuration value.

Assuming the cluster was deployed with the following configuration

cluster:
  storage:
    size: 10Gi

then change it to

cluster:
  storage:
    size: 15Gi

and redeploy the cluster. The volumes should get automatically upsized within minutes.

Bear in mind that volumes cannot be downsized.

Increasing the WAL storage size of a cluster

Follow the same instructions than in Data Platform/Systems/CloudnativePG/Clusters#Increasing the storage size of a cluster but adjust the cluster.walStorage.size parameter instead.

Increasing the max number of connections on PostgreSQL

PostgreSQL server configuration tunables can be set and tweaked via the cluster.postgresql.parameters YAML dictionary (except these ones). For example, if you'd like to increase the max_connectionssetting to 300 in a cluster using all default configuration values, change the cluster configuration to

cluster:
  postgresql:
    parameters:
      max_connections: 300
As each connection opens a UNIX process on the PG server, keep an eye on resource consumption and don't hesitate to increase its amount of CPU and RAM if needed.

Increasing the max number of client connections on PGBouncer

A subset of PGBouncer configuration options can be tweaked from the release YAML values themselves. For example, if PGBouncer is getting near its client connection limit, you can increase it by tweaking the pooler.parameters dictionary entries. For example:

pooler:
  parameters:
    max_client_conn: 200

All PGBouncer configuration options are available here: https://www.pgbouncer.org/config.html

  1. https://phabricator.wikimedia.org/T372282
  2. 2.0 2.1 https://phabricator.wikimedia.org/T372283
  3. https://phabricator.wikimedia.org/T372276