Jump to content

MariaDB/Sanitarium and clouddb instances

From Wikitech

Sanitarium is the name used for the MySQL/MariaDB instances and configuration that redact private information from the production wiki databases before replicating to the Wiki Replicas instances (clouddbXXXX) which are accessible from the Cloud Services network.

This page contains admin information specific to the Sanitarium hosts.

The full Wiki Replicas architecture is described in the page Admin/Wiki Replicas.

Sanitarium

role::mariadb::sanitarium_multiinstance

role::mariadb::sanitarium_multiinstance As of Dec 2023 we have 2 sanitarium hosts per DC (db1154, db1155, db2186, db2187) each of the running 4 instances of mariadb each of one port, so they are split across both hosts.

The running sections are specified on the MOTD of each host

2022-12-14:

DB section s1 (alias: mysql.s1)
DB section s3 (alias: mysql.s3)
DB section s5 (alias: mysql.s5)
DB section s8 (alias: mysql.s8)

datadir is in /srv/sqldata.s[1358] and tmpdir in /srv/tmp.s[1358]

All but certain filtered tables are replicated.

Why separate instances

Before having multi-instance sanitarium hosts, we had multi-source hosts (one mariadb process with multiple replication threads in it), but we migrated to multi-instance for the following reasons:

  • Easier to handle per-section maintenance
  • Corruption on one instance doesn't mean corruption on all of them
  • Cloning a section doesn't mean bringing up all the sections
  • Possibility to enable GTID (GTID didn't work on multi instance see: https://jira.mariadb.org/browse/MDEV-12012)

Triggers

The filters (triggers) used to redact the data were created by Redactatron (old repo, unmaintained), specifically redact_standard_output.sh. The following triggers are defined for each wiki's schema:

root@db1154.eqiad.wmnet[enwiki]> select TRIGGER_NAME,EVENT_OBJECT_TABLE,ACTION_TIMING from information_schema.triggers;
+----------------------------+--------------------+---------------+
| TRIGGER_NAME               | EVENT_OBJECT_TABLE | ACTION_TIMING |
+----------------------------+--------------------+---------------+
| sys_config_insert_set_user | sys_config         | BEFORE        |
| sys_config_update_set_user | sys_config         | BEFORE        |
| abuse_filter_log_insert    | abuse_filter_log   | BEFORE        |
| abuse_filter_log_update    | abuse_filter_log   | BEFORE        |
| recentchanges_insert       | recentchanges      | BEFORE        |
| recentchanges_update       | recentchanges      | BEFORE        |
| user_insert                | user               | BEFORE        |
| user_update                | user               | BEFORE        |
| archive_insert             | archive            | BEFORE        |
| archive_update             | archive            | BEFORE        |
+----------------------------+--------------------+---------------+
10 rows in set (0.005 sec)

Each trigger includes several tasks, e.g. SET NEW.user_password = ''

  • Monitor that all schemas have their filters in place—produce an error on icinga otherwise. There were some wikis that didn't have the filter.
SELECT SCHEMATA.SCHEMA_NAME FROM SCHEMATA WHERE SCHEMATA.SCHEMA_NAME NOT IN (SELECT EVENT_OBJECT_SCHEMA FROM TRIGGERS)

Private data checks

Run daily by check_private_data.py. Check check_private_data.pp for deployment details.

Wiki Replica views

The Wiki Replica hosts filter again by using custom views which set certain fields to NULL and preventing normal users from seeing the underlying tables directly. For example:

customviews:
  abuse_filter:
    source: abuse_filter
    view: >
      select af_id, if(af_hidden,null,af_pattern) as af_pattern,
      af_user, af_user_text, af_timestamp, af_enabled,
      if(af_hidden,null,af_comments) as af_comments, af_public_comments,
      af_hidden, af_hit_count, af_throttled, af_deleted, af_actions, af_global, af_group
CREATE ALGORITHM=UNDEFINED DEFINER=`viewmaster`@`%` SQL SECURITY DEFINER VIEW `abuse_filter` AS select `enwiki`.`abuse_filter`.`af_id` AS `af_id`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_pattern`) AS `af_pattern`,`enwiki`.`abuse_filter`.`af_user` AS `af_user`,`enwiki`.`abuse_filter`.`af_user_text` AS `af_user_text`,`enwiki`.`abuse_filter`.`af_timestamp` AS `af_timestamp`,`enwiki`.`abuse_filter`.`af_enabled` AS `af_enabled`,if(`enwiki`.`abuse_filter`.`af_hidden`,NULL,`enwiki`.`abuse_filter`.`af_comments`) AS `af_comments`,`enwiki`.`abuse_filter`.`af_public_comments` AS `af_public_comments`,`enwiki`.`abuse_filter`.`af_hidden` AS `af_hidden`,`enwiki`.`abuse_filter`.`af_hit_count` AS `af_hit_count`,`enwiki`.`abuse_filter`.`af_throttled` AS `af_throttled`,`enwiki`.`abuse_filter`.`af_deleted` AS `af_deleted`,`enwiki`.`abuse_filter`.`af_actions` AS `af_actions`,`enwiki`.`abuse_filter`.`af_global` AS `af_global`,`enwiki`.`abuse_filter`.`af_group` AS `af_group` from `enwiki`.`abuse_filter`

When a clouddb host has to be rebuilt from zero or if a new wiki is created, maintain-views is used to create/update the views.

Sanitarium's primary failover

This is an example of failing over a server that is a replication source for a Sanitarium host.

You can use Orchestrator to check the replication tree and the replication status, e.g. for s1 https://orchestrator.wikimedia.org/web/cluster/alias/s1

Example:

  • Sanitarium host: db1154
  • Current primary: db1196
  • New primary: db1206
The procedure below has never been tested and it's likely to fail. We should create a script to handle this in a safe and reliable way, some work is in progress at phab:T196366.
  • Downtime db1154:
    cumin1002:~$ sudo cookbook sre.hosts.downtime --hours 1 -r "Changing replication source for dbXXXX (TXXXXXX)" '<fqdn>'
    
  • Change the replication source:
    db1154:~$ sudo mysql.s1
    
    STOP SLAVE;
    CHANGE MASTER TO MASTER_HOST='db1206.eqiad.wmnet';
    START SLAVE;
    

See also the production docs at MariaDB#Manipulating_the_Replication_Tree.

Other

Note: operations/software/redactron.git and operations/software/labsdb-auditor.git contain historical software which is no longer used.