Jump to content

Help:Toolforge/ToolsDB

From Wikitech

Toolforge tools have access to a shared MariaDB server at tools.db.svc.wikimedia.cloud for tool-created databases. Database names must start with the name of the credential user followed by two underscores and then the name of the database: <credentialUser>__<DBName> (e.g. "s51234__mydb").

The credential user is not your user name. It can be found in your $HOME/replica.my.cnf file. The name of the credential user looks something like 'u1234' for a user and 's51234' for a tool account. You can also find the name of the credential user using a live database connection:

SELECT SUBSTRING_INDEX(CURRENT_USER(), '@', 1);
If your tool needs more than 25GB of storage, open connection limits that ToolsDB cannot support, or a Postgres runtime, Trove databases may be a better fit. ToolsDB is a shared resource that must impose connection and size limitations in exchange for zero administration requirements. Using Trove removes those limitations, but requires a small amount of administration. Tools can request the ability to create Trove databases via a Toolforge quota-request task.

Privileges on the database

Users have all privileges and have access to all grant options on their databases. Database names ending with _p are granted read access for everyone. Please create a ticket if you need more fine-grained permissions, like sharing a database only between 2 users, or other special permissions.

Public databases in ToolsDB (the ones with a name ending in _p) can also be accessed from Quarry and Superset.

Steps to create a user database

To create a database on tools.db.svc.wikimedia.cloud:

  1. Become your tool account.
    maintainer@tools-login:~$ become toolaccount
  2. Connect to tools.db.svc.wikimedia.cloud with the replica.my.cnf credentials:
    mariadb --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud
    You could also just type:
    sql tools
  3. In the MariaDB console, create a new database (where CREDENTIALUSER is your credentials user, which can be found in your ~/replica.my.cnf file, and DBNAME the name you want to give to your database. Note that there are 2 underscores between CREDENTIALUSER and DBNAME):
    MariaDB [(none)]> CREATE DATABASE CREDENTIALUSER__DBNAME;

You can then connect to your database using:

$ mariadb --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud CREDENTIALUSER__DBNAME

Or:

$ sql tools
MariaDB [(none)]> USE CREDENTIALUSER__DBNAME;

Example

Assuming that your tool account is called "mytool", this is what it would look like:

$ maintainer@tools-login:~$ become mytool
$ tools.mytool@tools-login:~$ mariadb --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud
MariaDB [(none)]> select substring_index(current_user(), '@', 1) as uname;
+---------------+
| uname         |
+---------------+
| u123something |
+---------------+
1 row in set (0.00 sec)
MariaDB [(none)]> create database u123something__wiki;
Caution: The legacy tools-db service name was deprecated in September 2017 and removed in May 2019. Use tools.db.svc.wikimedia.cloud instead.

Note: Some projects like python-Django can throw an exception like MySQLdb._exceptions.OperationalError: (1709, 'Index column size too large. The maximum column size is 767 bytes.') when migrated using the setup above. This can be fixed by altering the database charset to utf-8in most cases. To avoid this, create the database using the following command instead to specify the charset:

MariaDB [(none)]> CREATE DATABASE CREDENTIALUSER__DBNAME CHARACTER SET utf8;

Read-only replica host

We maintain two copies of the ToolsDB database, using a MariaDB primary-replica setup.

The read-only replica host can be accessed using the same credentials and the following hostname: tools-readonly.db.svc.wikimedia.cloud

Using the read-only replica host is recommended if you have to run queries that take a long time to complete, as in this way you will reduce the load on the primary host.

Please note that the replica host can sometimes lag behind the primary host, but we are doing our best to keep this lag at a minimum.

Backups

We don't do offline backups of any of the databases in ToolsDB. ToolsDB users can backup their data using mariadb-dump (included in the mariadb image) if necessary:

:# use umask to make the dump private (use unless the database is public)
$ toolforge jobs run --command "umask o-r; ( mariadb-dump --defaults-file=~/replica.my.cnf --host=tools-readonly.db.svc.wikimedia.cloud credentialUser__DBName > ~/DBname-$(date -I).sql )" --image mariadb backup

Note that we don't recommend storing backups permanently on NFS (/data/project, /home, or /data/scratch on Toolforge) or on any other Cloud VPS hosted drive. True backups should be kept offsite.

Caveats

The Toolsforge team tries to keep ToolsDB configurations as close to MariaDB defaults as possible. This can lead to surprising behaviors, such as:

  1. Transactions not rolled back on query timeouts, which can be common during high load on a shared database (see this issue)

If you encounter an issue, feel free to add it above.

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)