MariaDB/Sanitarium and clouddb instances
Sanitarium is the name used for the MySQL/MariaDB instances and configuration that redact private information from the production wiki databases before replicating to the Wiki Replicas instances (clouddbXXXX) which are accessible from the Cloud Services network.
This page contains admin information specific to the Sanitarium hosts.
The full Wiki Replicas architecture is described in the page Admin/Wiki Replicas.
Sanitarium
role::mariadb::sanitarium_multiinstance
role::mariadb::sanitarium_multiinstance As of Dec 2023 we have 2 sanitarium hosts per DC (db1154, db1155, db2186, db2187) each of the running 4 instances of mariadb each of one port, so they are split across both hosts.
The running sections are specified on the MOTD of each host
2022-12-14:
DB section s1 (alias: mysql.s1) DB section s3 (alias: mysql.s3) DB section s5 (alias: mysql.s5) DB section s8 (alias: mysql.s8)
datadir
is in /srv/sqldata.s[1358]
and tmpdir
in /srv/tmp.s[1358]
All but certain filtered tables are replicated.
Why separate instances
Before having multi-instance sanitarium hosts, we had multi-source hosts (one mariadb process with multiple replication threads in it), but we migrated to multi-instance for the following reasons:
- Easier to handle per-section maintenance
- Corruption on one instance doesn't mean corruption on all of them
- Cloning a section doesn't mean bringing up all the sections
- Possibility to enable GTID (GTID didn't work on multi instance see: https://jira.mariadb.org/browse/MDEV-12012)
Triggers
The filters (triggers) used to redact the data were created by Redactatron (old repo, unmaintained), specifically redact_standard_output.sh. The following triggers are defined for each wiki's schema:
root@db1154.eqiad.wmnet[enwiki]> select TRIGGER_NAME,EVENT_OBJECT_TABLE,ACTION_TIMING from information_schema.triggers; +----------------------------+--------------------+---------------+ | TRIGGER_NAME | EVENT_OBJECT_TABLE | ACTION_TIMING | +----------------------------+--------------------+---------------+ | sys_config_insert_set_user | sys_config | BEFORE | | sys_config_update_set_user | sys_config | BEFORE | | abuse_filter_log_insert | abuse_filter_log | BEFORE | | abuse_filter_log_update | abuse_filter_log | BEFORE | | recentchanges_insert | recentchanges | BEFORE | | recentchanges_update | recentchanges | BEFORE | | user_insert | user | BEFORE | | user_update | user | BEFORE | | archive_insert | archive | BEFORE | | archive_update | archive | BEFORE | +----------------------------+--------------------+---------------+ 10 rows in set (0.005 sec)
Each trigger includes several tasks, e.g. SET NEW.user_password = ''
- Monitor that all schemas have their filters in place—produce an error on icinga otherwise. There were some wikis that didn't have the filter.
SELECT SCHEMATA.SCHEMA_NAME FROM SCHEMATA WHERE SCHEMATA.SCHEMA_NAME NOT IN (SELECT EVENT_OBJECT_SCHEMA FROM TRIGGERS)
Private data checks
Run daily by check_private_data.py. Check check_private_data.pp for deployment details.
Wiki Replica views
The Wiki Replica hosts filter again by using custom views which set certain fields to NULL and preventing normal users from seeing the underlying tables directly. For example:
customviews:
abuse_filter:
source: abuse_filter
view: >
select af_id, if(af_hidden,null,af_pattern) as af_pattern,
af_user, af_user_text, af_timestamp, af_enabled,
if(af_hidden,null,af_comments) as af_comments, af_public_comments,
af_hidden, af_hit_count, af_throttled, af_deleted, af_actions, af_global, af_group
CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `abuse_filter` AS select `enwiki`.`abuse_filter`.`af_id` AS `af_id`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_pattern`) AS `af_pattern`,`enwiki`.`abuse_filter`.`af_user` AS `af_user`,`enwiki`.`abuse_filter`.`af_user_text` AS `af_user_text`,`enwiki`.`abuse_filter`.`af_timestamp` AS `af_timestamp`,`enwiki`.`abuse_filter`.`af_enabled` AS `af_enabled`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_comments`) AS `af_comments`,`enwiki`.`abuse_filter`.`af_public_comments` AS `af_public_comments`,`enwiki`.`abuse_filter`.`af_hidden` AS `af_hidden`,`enwiki`.`abuse_filter`.`af_hit_count` AS `af_hit_count`,`enwiki`.`abuse_filter`.`af_throttled` AS `af_throttled`,`enwiki`.`abuse_filter`.`af_deleted` AS `af_deleted`,`enwiki`.`abuse_filter`.`af_actions` AS `af_actions`,`enwiki`.`abuse_filter`.`af_global` AS `af_global`,`enwiki`.`abuse_filter`.`af_group` AS `af_group` from `enwiki`.`abuse_filter`
When a clouddb host has to be rebuilt from zero or if a new wiki is created, maintain-views is used to create/update the views.
Sanitarium's primary failover
This is an example of failing over a server that is a replication source for a Sanitarium host.
You can use Orchestrator to check the replication tree and the replication status, e.g. for s1
https://orchestrator.wikimedia.org/web/cluster/alias/s1
Example:
- Sanitarium host: db1154
- Current primary: db1196
- New primary: db1206
- Downtime db1154:
cumin1002:~$ sudo cookbook sre.hosts.downtime --hours 1 -r "Changing replication source for dbXXXX (TXXXXXX)" '<fqdn>'
- Change the replication source:
db1154:~$ sudo mysql.s1 STOP SLAVE; CHANGE MASTER TO MASTER_HOST='db1206.eqiad.wmnet'; START SLAVE;
See also the production docs at MariaDB#Manipulating_the_Replication_Tree.
Other
Note: operations/software/redactron.git and operations/software/labsdb-auditor.git contain historical software which is no longer used.