Postgres
PostgreSQL is used for specific services in the foundation's infrastructure. Currently Maps, Netbox, Airflow and PuppetDB use Postgres to store data.
Monitoring
Syncing Postgres replica
After a postgres slave replica has been re-imaged or it has fallen behind in replication due to network issues or other problems, it will first need to sync all data before it can be placed in production. The sre.postgresql.postgres-init
cookbook can be used to resync a host
$ sudo cookbook sre.postgresql.postgres-init --replica ${replica_server} --reason "Resync after planet sync" --depool --pgversion 11
POSTGRES_HOT_STANDBY_DELAY
This alert indicates that a replica has fallen behind the master database in replication. This is not immediately a cause for panic, as a host can recover assuming it is not left out of replication for long. However, the postgres logs should be examined and connectivity between individual replicas and the master should also be checked.
If many or all hosts in the maps cluster are issuing this alert, it could be the case that the OSM dataset is being refreshed - during this operation it is likely that hosts will fall behind as large numbers of changes happen on the master during import. Check the cluster masters for import processes - generally this is a passing problem and will self-resolve over time. Follow the restart process if hosts need to be restarted.
requested WAL segment XXXXXXXXXXXXXXXX has already been removed
In this case the WAL segments (roughly the postgres' journal) have been deleted on the master, that rightfully tells the replica about their absence causing the replication to break. If this happens, you need to resync the replica from the master, using the sre.postgresql.postgres-init
cookbook. https://phabricator.wikimedia.org/T383114 shows an example of an occurrence of the issue on puppetdb2003
.
Dashboards
We have a number of Grafana dashboards that can be helpful:
- https://grafana-rw.wikimedia.org/d/000000828/postgresql-database
- https://grafana-rw.wikimedia.org/d/000000469/postgres?orgId=1
- https://grafana-rw.wikimedia.org/d/wGgaPlciz/postgresql-overview?orgId=1
Related pages
- Puppet (← links)
- Service restarts (← links)
- Postgresql (redirect page) (← links)
- PostgreSQL (redirect page) (← links)
- Netbox (← links)
- Maps/OSM Database Legacy (← links)
- Data Platform/Systems/Ceph (← links)
- Data Platform/Systems/CloudnativePG (← links)
- User:MSantos (WMF)/Maps/OSM Database (← links)