Data Platform/Systems/CloudnativePG/Clusters
Using the CloudnativePG operator, it becomes quite easy to deploy PostgreSQL clusters in Kubernetes.
We have created a custom cloudnative-pg-cluster chart, allowing users to create a cluster from a very simple YAML values file, such as this one.
fullnameOverride: postgres-airflow-analytics
By default, each cluster will upload base backups and WALs to a dedicated S3 bucket, and performs a daily backup at midnight UTC.
Observability
Dashboard
All cloudnative-pg clusters are instrumented using Prometheus and can be observed via this Grafana dashboard.
Monitoring
Operations
Performing a manual backup
[1] On the deployment server, run the following commands.
brouberol@deploy1003:~$ cat <<EOF >manual-backup.yaml
apiVersion: postgresql.cnpg.io/v1
kind: Backup
metadata:
name: postgresql-test-on-demand-backup-$(date "+%Y%m%d%H%M%S")
spec:
method: barmanObjectStore
cluster:
name: postgresql-test
EOF
brouberol@deploy1003:/home/brouberol# kubectl create -f manual-backup.yaml -n postgresql-test
backup.postgresql.cnpg.io/postgresql-test-on-demand-backup-20240917141528 created
You can then check the state of the backup with kubectl.
brouberol@deploy1003:/home/brouberol# kubectl get backups -n postgresql-test --sort-by='{metadata.creationTimestamp}' --no-headers | tac
postgresql-test-on-demand-backup-20240917141528 72s postgresql-test barmanObjectStore completed
postgresql-test-daily-backup-20240917000000 14h postgresql-test barmanObjectStore completed
postgresql-test-daily-backup-20240916073251 30h postgresql-test barmanObjectStore completed
Restoring a backup by replaying all WAL files
[2] The easiest way to restore the latest archived state of a pre-existing database is to deploy it with the following values:
mode: recovery
recovery:
method: object_store
clusterName: postgresql-test
backups:
enabled: false
When deploying the cluster, this will start a full-recovery pod, that will replay all WAL files, after which the PG pods themselves will be deployed. After everything is deployed successfully, redeploy the cluster with all ^ these ^ values either removed or commented out. This will make sure to start the cluster in a normal mode and re-enable backups (and a backup will be immediately triggered for good measure).
Performing a Point In Time Recovery (PITR)
[2] A PITR allows you to restore the state as backed up at a certain point in time.
You can list the available base backups by descending dates by running
root@deploy1003:~# kubectl get backups -n postgresql-test --sort-by='{metadata.creationTimestamp}' --no-headers | tac
postgresql-test-daily-backup-20240917000000 13h postgresql-test barmanObjectStore completed
postgresql-test-daily-backup-20240916073251 30h postgresql-test barmanObjectStore completed
Say we'd like to restore postgresql-test-daily-backup-20240917000000
. To do this, first extract the creation timestamp of the backup resource:
root@deploy1003:~# kubectl get backups -n postgresql-test -o jsonpath='{.metadata.creationTimestamp}{"\n"}' postgresql-test-daily-backup-20240917000000
2024-09-17T00:00:00Z
You then need to convert this timestamp to another format, understood by the https://pgbarman.org/ tools, used to restore the backup.
$ python3 -c'import datetime as dt;print(dt.datetime.fromisoformat("2024-09-17T00:00:00Z").strftime("%Y-%m-%d %H:%M:%S.%f%z"))'
2024-09-17 00:00:00.000000+0000
You can then deploy the cluster with the following values:
mode: recovery
recovery:
method: object_store
clusterName: postgresql-test
recoveryTarget:
targetTime: "2024-09-17 00:00:00.000000+0000"
backups:
enabled: false
Once the cluster is deployed and running, redeploy it with all ^ these ^ values commented out or deleted. This will make sure to start the cluster in a normal mode and re-enable backups (and a backup will be immediately triggered for good measure).
Increasing the storage size of a cluster
[3] As we're storing the data on Ceph, we can resize the volume in which PG stores its data on the fly. To do this, simply adjust thecluster.storage.size
cluster configuration value.
Assuming the cluster was deployed with the following configuration
cluster:
storage:
size: 10Gi
then change it to
cluster:
storage:
size: 15Gi
and redeploy the cluster. The volumes should get automatically upsized within minutes.
Increasing the WAL storage size of a cluster
Follow the same instructions than in Data Platform/Systems/CloudnativePG/Clusters#Increasing the storage size of a cluster but adjust the cluster.walStorage.size
parameter instead.
Increasing the max number of connections on PostgreSQL
PostgreSQL server configuration tunables can be set and tweaked via the cluster.postgresql.parameters
YAML dictionary (except these ones). For example, if you'd like to increase the max_connections
setting to 300 in a cluster using all default configuration values, change the cluster configuration to
cluster:
postgresql:
parameters:
max_connections: 300
Increasing the max number of client connections on PGBouncer
A subset of PGBouncer configuration options can be tweaked from the release YAML values themselves. For example, if PGBouncer is getting near its client connection limit, you can increase it by tweaking the pooler.parameters
dictionary entries. For example:
pooler:
parameters:
max_client_conn: 200
All PGBouncer configuration options are available here: https://www.pgbouncer.org/config.html