User:MSantos/Maps/OSM Database
OSM Database
The Maps OSM database is an important piece of the infrastructure, it's a PostgreSQL database that hosts the data source for tile generation and the Geoshapes service.
The OSM Database needs two processes that are fundamental in order to keep it available with the most updated data from OSM: initial import and replication.
Initial Importing of OSM data into PostgreSQL database
This process needs to be done in all clusters available, one at a time.
When you need to do initial import?
The following scenarios require a fresh install of the OSM database:
- First setup of the cluster
- Any upgrades or maintenance that require deleting the PostgreSQL data folder, for example:
- OS migration T198622
- Disk space in critical condition because of OSM replication
- OSM replication is disabled for a long time range and the database is outdated
- Depending on the server condition, is safer to re-import OSM data instead of roll replication, because it can overload Tilerator and increase significantly the CPU usage
Importing
Checklist
- Create a Phab task to track this work. Example.
- Depool maps2004.
- Isolate maps2004 from the rest of the cluster. Example.
- Disable Tilerator in puppet. Example.
- Delete postgres data, recreate the empty DB.
- Run `osm-initial-import` script to reimport data from OSM, needs root access.
- Re-enable replication
- Repool maps2004
- Re-init all slaves (cookbook)
- Re-enable tilerator
Re-creating Postgres DB
- Clear the Postgres data directory and init the database from backup (replace
maps2001.codfw.wmnet
by the postgres master):
rm -rf /srv/postgresql/9.4/main/* && sudo -u postgres pg_basebackup -X stream -D /srv/postgresql/9.4/main/ -h maps2001.codfw.wmnet -U replication -W
The initial-import-script
Initial data load of OSM into postgresql is done by running /usr/local/bin/osm-initial-import
on the postgresql master node
osm-initial-import \
-d <date_of_import> \
-s <state_file_url> \
-x webproxy.eqiad.wmnet:8080
- date_of_import: find the latest dump at https://planet.osm.org/pbf/. Example:
160530
. Do NOT use "latest", as that might change at any moment. - state_file_url: The URL to the state file corresponding to the dump, find the correct one at http://planet.openstreetmap.org/replication/ (the state file must be older than the dump). Example:
http://planet.openstreetmap.org/replication/day/000/001/355.state.txt
.
Troubleshouting
- If the slave initialization timeouts in puppet, it needs to be run manually:
service postgresql@9.4-main stop
rm -rf /srv/postgresql/9.4/main
sudo -u postgres /usr/bin/pg_basebackup -X stream -D /srv/postgresql/9.6/main -h <maps_master fqdn> -U replication -w
# Run puppet to make sure recovery.conf file is created
service postgresql@9.4-main start
OSM replication
In order to keep the OSM Database updated, we use the following tools: osmosis and osm2pgsql. These tools fetch data from OSM servers and apply the requested changesets in the PostgreSQL database. When it's complete, the replication script will trigger a request to Tilerator, sending a list of tiles that needs to be regenerated.
Monitoring
You can monitor PostgreSQL activities of the Maps clusters in the following dashboards: