Jump to content

Postgres

From Wikitech

PostgreSQL is used for specific services in the foundation's infrastructure. Currently Maps, Netbox, Airflow and PuppetDB use Postgres to store data.

Monitoring

Syncing Postgres replica

After a postgres slave replica has been re-imaged or it has fallen behind in replication due to network issues or other problems, it will first need to sync all data before it can be placed in production. The sre.postgresql.postgres-init cookbook can be used to resync a host

$ sudo cookbook sre.postgresql.postgres-init --replica ${replica_server} --reason "Resync after planet sync" --depool --pgversion 11

POSTGRES_HOT_STANDBY_DELAY

This alert indicates that a replica has fallen behind the master database in replication. This is not immediately a cause for panic, as a host can recover assuming it is not left out of replication for long. However, the postgres logs should be examined and connectivity between individual replicas and the master should also be checked.

If many or all hosts in the maps cluster are issuing this alert, it could be the case that the OSM dataset is being refreshed - during this operation it is likely that hosts will fall behind as large numbers of changes happen on the master during import. Check the cluster masters for import processes - generally this is a passing problem and will self-resolve over time. Follow the restart process if hosts need to be restarted.

requested WAL segment XXXXXXXXXXXXXXXX has already been removed

In this case the WAL segments (roughly the postgres' journal) have been deleted on the master, that rightfully tells the replica about their absence causing the replication to break. If this happens, you need to resync the replica from the master, using the sre.postgresql.postgres-init cookbook. https://phabricator.wikimedia.org/T383114 shows an example of an occurrence of the issue on puppetdb2003.

Dashboards

We have a number of Grafana dashboards that can be helpful: