Jump to content

Help:Toolforge/Database

From Wikitech
This page can be improved by breaking up its content into other docs. See phab:T232404. Contributions welcome!

Tools and Toolforge users have access to two sets of databases:

On the wiki replicas, private user data has been redacted (some rows are elided and/or some columns are made NULL depending on the table). For most practical purposes this is identical to the production databases and sharded into clusters in much the same way.

Database credentials are generated on account creation and placed in a file called replica.my.cnf in the home directory of both a Tool and a Tools user account. This file cannot be modified or removed by users. (If the file does not exist, ask for help. Do not try to create it yourself: it will not work.)

Symlinking the access file can be practical:

$ ln -s $HOME/replica.my.cnf $HOME/.my.cnf


Connecting to the database replicas

You can connect to the database replicas (and/or the cluster where a database replica is hosted) by specifying your access credentials and the alias of the cluster and replicated database. For example:

To connect to the English Wikipedia replica, specify the alias of the hosting cluster (enwiki.analytics.db.svc.wikimedia.cloud) and the alias of the database replica (enwiki_p):

$ mariadb --defaults-file=$HOME/replica.my.cnf -h enwiki.analytics.db.svc.wikimedia.cloud enwiki_p

To connect to the Wikidata cluster:

$ mariadb --defaults-file=$HOME/replica.my.cnf -h wikidatawiki.analytics.db.svc.wikimedia.cloud

To connect to Commons cluster:

$ mariadb --defaults-file=$HOME/replica.my.cnf -h commonswiki.analytics.db.svc.wikimedia.cloud

There is also a shortcut for connecting to the replicas: sql <dbname>[_p] The _p is optional, but implicit (i.e. the sql tool will add it if absent).

To connect to the English Wikipedia database replica using the shortcut, simply type:

$ sql enwiki

To connect to ToolsDB where you can create and write to tables, type:

$ sql tools

This sets server to "tools.db.svc.wikimedia.cloud" and db to "". It's equivalent to typing-

$ mariadb --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud

Databases

Replica databases

User databases

Toolforge tools have two options for creating databases:

  • ToolsDB is a shared MariaDB database server suitable for relatively small amounts of data
  • Trove databases support more use cases than ToolsDB, but take more resources (and so require a specific quota request to use)

Query Limits

One can use max_statement_time (unit is seconds, it allows decimals):

SET max_statement_time = 300;

And all subsequent queries on the same connection will be killed if they run for longer than the given time.

For example:

mariadb[(none)]> SET max_statement_time = 10;
Query OK, 0 rows affected (0.00 sec)

mariadb[(none)]> SELECT sleep(20);
+-----------+
| sleep(20) |
+-----------+
|         1 |
+-----------+
1 row in set (10.00 sec)

It works on Quarry, too!

You can also set limits with a single SQL query. For example:

SET STATEMENT max_statement_time = 300 FOR
SELECT COUNT(rev_id) FROM revision_userindex
INNER JOIN actor
 ON rev_actor = actor_id
WHERE actor_name = 'Jimbo Wales'

Code samples for common languages

Copied with edits from mw:Toolserver:Database access#Program access (not all tested, use with caution!)

In most programming languages, it will be sufficient to tell MariaDB to use the database credentials found in $HOME/.my.cnf assuming that you have created a symlink from $HOME/.my.cnf to $HOME/replica.my.cnf.

Below are various examples in a few common programming languages.

Bash

-- 2> /dev/null; date; echo '
/* Bash/SQL compatible test structure
 *
 * Run time: ? <SLOW_OK>
 */
SELECT 1
;-- ' | mariadb -ch tools.db.svc.wikimedia.cloud enwiki_p > ~/query_results-enwiki; date;

C

#include <my_global.h>
#include <mysql.h>

...

 char *host = "tools.db.svc.wikimedia.cloud";
 MYSQL *conn = mysql_init(NULL);

 mysql_options(conn, MYSQL_READ_DEFAULT_GROUP, "client");
 if (mysql_real_connect(conn, host, NULL, NULL, NULL, 0, NULL, 0) == NULL) {
    printf("Error %u: %s\n", mysql_errno(conn), mysql_error(conn));
    ...
 }

Perl

use User::pwent;
use DBI;

my $database = "enwiki_p";
my $host = "tools.db.svc.wikimedia.cloud";

my $dbh = DBI->connect(
    "DBI:mysql:database=$database;host=$host;"
    . "mysql_read_default_file=" . getpwuid($<)->dir . "/replica.my.cnf",
    undef, undef) or die "Error: $DBI::err, $DBI::errstr";

Python

Without installing the toolforge library, this will work:

import configparser
import pathlib
import pymysql
import pymysql.cursors

replica = pathlib.Path.home().joinpath("replica.my.cnf")
config = configparser.ConfigParser()
config.read_string(replica.read_text())
connection = pymysql.connections.Connection(
    host="commonswiki.analytics.db.svc.wikimedia.cloud",
    database="commonswiki_p",
    user=config.get("client", "user"),
    password=config.get("client", "password"),
    cursorclass=pymysql.cursors.DictCursor,
)

with connection.cursor() as cur:
    cur.execute(query)  # Or something....
connection.close()

Using User:Legoktm/toolforge library, however, is probably the easiest way. This wrapper library supports both Python 3 and legacy Python 2 applications and provides convenience functions for connecting to the Wiki Replica databases.

import toolforge
conn = toolforge.connect('enwiki') # You can also use "enwiki_p"
# conn is a pymysql.connection object.
with conn.cursor() as cur:
    cur.execute(query)  # Or something....

We used to recommend oursql as well, but as of 2019-02-20 it seems to be abandoned or at least not actively maintained and failing to compile against MariaDB client libraries.

Python: Django

If you are using Django, first install mysqlclient (inside your tool's virtual environment, accessed via a webservice shell):

export MYSQLCLIENT_CFLAGS="-I/usr/include/mariadb/"
export MYSQLCLIENT_LDFLAGS="-L/usr/lib/x86_64-linux-gnu/ -lmariadb"
pip install mysqlclient

Then insert the database in the settings.py file as following, with s12345 your user name:

import configparser
import os

HOME=os.environ.get('HOME') #get environment variable $HOME

replica_path=HOME + '/replica.my.cnf'
if os.path.exists(replica_path):          #check that the file is found
    config = configparser.ConfigParser()
    config.read(replica_path)
else:
    print('replica.my.cnf file not found')

DATABASES = {
    'default': {
         'ENGINE': 'django.db.backends.mysql',
         'NAME': 's12345__mydbname',                                 
         'USER': config['client']['user'],                          #for instance "s12345"
         'PASSWORD': config['client']['password'],
         'HOST': 'tools.db.svc.wikimedia.cloud',
         'PORT': '',
     }
}

PHP (using PDO)

<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$db = new PDO("mysql:host=enwiki.analytics.db.svc.wikimedia.cloud;dbname=enwiki_p", $ts_mycnf['user'], $ts_mycnf['password']);
unset($ts_mycnf, $ts_pw);

$q = $db->prepare('select * from page where page_id = :id');
$q->execute(array(':id' => 843020));
print_r($q->fetchAll());
?>

PHP (using MySQLi)

<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$mysqli = new mysqli('enwiki.analytics.db.svc.wikimedia.cloud', $ts_mycnf['user'], $ts_mycnf['password'], 'enwiki_p');
unset($ts_mycnf, $ts_pw);

$stmt = $mysqli->prepare('select * from page where page_id = ?');
$id = 843020;
$stmt->bind_param('i', $id);
$stmt->execute();
$result = $stmt->get_result();
print_r($result->fetch_all(MYSQLI_BOTH));
?>

Java

Class.forName("com.mysql.jdbc.Driver").newInstance();
Properties mycnf = new Properties();
mycnf.load(new FileInputStream(System.getProperty("user.home")+"/replica.my.cnf"));
String password = mycnf.getProperty("password");
password=password.substring((password.startsWith("\""))?1:0, password.length()-((password.startsWith("\""))?1:0));
mycnf.put("password", password);
mycnf.put("useOldUTF8Behavior", "true");
mycnf.put("useUnicode", "true");
mycnf.put("characterEncoding", "UTF-8");
mycnf.put("connectionCollation", "utf8_general_ci");
String url = "jdbc:mysql://tools.db.svc.wikimedia.cloud:3306/enwiki_p";
Connection conn = DriverManager.getConnection(url, mycnf);

Node.js

mysql2 client provides a promise-based interface.

const mysql = require('mysql2/promise');
async function sample() {
  const connection = await mysql.createConnection({
    host: 'tools.db.svc.wikimedia.cloud', 
    port: 3306,
    database: 's12345__mydbname', 
    user: 's12345', 
    password: ''
  });
  const [rows, fields] = await connection.execute('SELECT * FROM table WHERE name = ? AND age > ?', ['Morty', 14]);
  for (let row in rows) console.log(row);
}


See also

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)