Help:Toolforge/Database
Tools and Toolforge users have access to two sets of databases:
- wiki replicas with read-only access to production MediaWiki data
- user databases (aka ToolsDB)
On the wiki replicas, private user data has been redacted (some rows are elided and/or some columns are made NULL depending on the table). For most practical purposes this is identical to the production databases and sharded into clusters in much the same way.
Database credentials are generated on account creation and placed in a file called replica.my.cnf
in the home directory of both a Tool and a Tools user account. This file cannot be modified or removed by users. (If the file does not exist, ask for help. Do not try to create it yourself: it will not work.)
Symlinking the access file can be practical:
$ ln -s $HOME/replica.my.cnf $HOME/.my.cnf
Connecting to the database replicas
You can connect to the database replicas (and/or the cluster where a database replica is hosted) by specifying your access credentials and the alias of the cluster and replicated database. For example:
To connect to the English Wikipedia replica, specify the alias of the hosting cluster (enwiki.analytics.db.svc.wikimedia.cloud) and the alias of the database replica (enwiki_p):
$ mariadb --defaults-file=$HOME/replica.my.cnf -h enwiki.analytics.db.svc.wikimedia.cloud enwiki_p
To connect to the Wikidata cluster:
$ mariadb --defaults-file=$HOME/replica.my.cnf -h wikidatawiki.analytics.db.svc.wikimedia.cloud
To connect to Commons cluster:
$ mariadb --defaults-file=$HOME/replica.my.cnf -h commonswiki.analytics.db.svc.wikimedia.cloud
There is also a shortcut for connecting to the replicas: sql <dbname>[_p] The _p is optional, but implicit (i.e. the sql tool will add it if absent).
To connect to the English Wikipedia database replica using the shortcut, simply type:
$ sql enwiki
To connect to ToolsDB where you can create and write to tables, type:
$ sql tools
This sets server to "tools.db.svc.wikimedia.cloud" and db to "". It's equivalent to typing-
$ mariadb --defaults-file=$HOME/replica.my.cnf -h tools.db.svc.wikimedia.cloud
Databases
Replica databases
User databases
Toolforge tools have two options for creating databases:
- ToolsDB is a shared MariaDB database server suitable for relatively small amounts of data
- Trove databases support more use cases than ToolsDB, but take more resources (and so require a specific quota request to use)
Query Limits
One can use max_statement_time (unit is seconds, it allows decimals):
SET max_statement_time = 300;
And all subsequent queries on the same connection will be killed if they run for longer than the given time.
For example:
mariadb[(none)]> SET max_statement_time = 10;
Query OK, 0 rows affected (0.00 sec)
mariadb[(none)]> SELECT sleep(20);
+-----------+
| sleep(20) |
+-----------+
| 1 |
+-----------+
1 row in set (10.00 sec)
It works on Quarry, too!
You can also set limits with a single SQL query. For example:
SET STATEMENT max_statement_time = 300 FOR
SELECT COUNT(rev_id) FROM revision_userindex
INNER JOIN actor
ON rev_actor = actor_id
WHERE actor_name = 'Jimbo Wales'
Code samples for common languages
- Copied with edits from mw:Toolserver:Database access#Program access (not all tested, use with caution!)
In most programming languages, it will be sufficient to tell MariaDB to use the database credentials found in $HOME/.my.cnf assuming that you have created a symlink from $HOME/.my.cnf to $HOME/replica.my.cnf.
Below are various examples in a few common programming languages.
Bash
-- 2> /dev/null; date; echo '
/* Bash/SQL compatible test structure
*
* Run time: ? <SLOW_OK>
*/
SELECT 1
;-- ' | mariadb -ch tools.db.svc.wikimedia.cloud enwiki_p > ~/query_results-enwiki; date;
C
#include <my_global.h>
#include <mysql.h>
...
char *host = "tools.db.svc.wikimedia.cloud";
MYSQL *conn = mysql_init(NULL);
mysql_options(conn, MYSQL_READ_DEFAULT_GROUP, "client");
if (mysql_real_connect(conn, host, NULL, NULL, NULL, 0, NULL, 0) == NULL) {
printf("Error %u: %s\n", mysql_errno(conn), mysql_error(conn));
...
}
Perl
use User::pwent;
use DBI;
my $database = "enwiki_p";
my $host = "tools.db.svc.wikimedia.cloud";
my $dbh = DBI->connect(
"DBI:mysql:database=$database;host=$host;"
. "mysql_read_default_file=" . getpwuid($<)->dir . "/replica.my.cnf",
undef, undef) or die "Error: $DBI::err, $DBI::errstr";
Python
Without installing the toolforge library, this will work:
import configparser
import pathlib
import pymysql
import pymysql.cursors
replica = pathlib.Path.home().joinpath("replica.my.cnf")
config = configparser.ConfigParser()
config.read_string(replica.read_text())
connection = pymysql.connections.Connection(
host="commonswiki.analytics.db.svc.wikimedia.cloud",
database="commonswiki_p",
user=config.get("client", "user"),
password=config.get("client", "password"),
cursorclass=pymysql.cursors.DictCursor,
)
with connection.cursor() as cur:
cur.execute(query) # Or something....
connection.close()
Using User:Legoktm/toolforge library, however, is probably the easiest way. This wrapper library supports both Python 3 and legacy Python 2 applications and provides convenience functions for connecting to the Wiki Replica databases.
import toolforge
conn = toolforge.connect('enwiki') # You can also use "enwiki_p"
# conn is a pymysql.connection object.
with conn.cursor() as cur:
cur.execute(query) # Or something....
We used to recommend oursql as well, but as of 2019-02-20 it seems to be abandoned or at least not actively maintained and failing to compile against MariaDB client libraries.
Python: Django
If you are using Django, first install mysqlclient (inside your tool's virtual environment, accessed via a webservice shell
):
export MYSQLCLIENT_CFLAGS="-I/usr/include/mariadb/" export MYSQLCLIENT_LDFLAGS="-L/usr/lib/x86_64-linux-gnu/ -lmariadb" pip install mysqlclient
Then insert the database in the settings.py file as following, with s12345 your user name:
import configparser
import os
HOME=os.environ.get('HOME') #get environment variable $HOME
replica_path=HOME + '/replica.my.cnf'
if os.path.exists(replica_path): #check that the file is found
config = configparser.ConfigParser()
config.read(replica_path)
else:
print('replica.my.cnf file not found')
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': 's12345__mydbname',
'USER': config['client']['user'], #for instance "s12345"
'PASSWORD': config['client']['password'],
'HOST': 'tools.db.svc.wikimedia.cloud',
'PORT': '',
}
}
PHP (using PDO)
<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$db = new PDO("mysql:host=enwiki.analytics.db.svc.wikimedia.cloud;dbname=enwiki_p", $ts_mycnf['user'], $ts_mycnf['password']);
unset($ts_mycnf, $ts_pw);
$q = $db->prepare('select * from page where page_id = :id');
$q->execute(array(':id' => 843020));
print_r($q->fetchAll());
?>
PHP (using MySQLi)
<?php
$ts_pw = posix_getpwuid(posix_getuid());
$ts_mycnf = parse_ini_file($ts_pw['dir'] . "/replica.my.cnf");
$mysqli = new mysqli('enwiki.analytics.db.svc.wikimedia.cloud', $ts_mycnf['user'], $ts_mycnf['password'], 'enwiki_p');
unset($ts_mycnf, $ts_pw);
$stmt = $mysqli->prepare('select * from page where page_id = ?');
$id = 843020;
$stmt->bind_param('i', $id);
$stmt->execute();
$result = $stmt->get_result();
print_r($result->fetch_all(MYSQLI_BOTH));
?>
Java
Class.forName("com.mysql.jdbc.Driver").newInstance();
Properties mycnf = new Properties();
mycnf.load(new FileInputStream(System.getProperty("user.home")+"/replica.my.cnf"));
String password = mycnf.getProperty("password");
password=password.substring((password.startsWith("\""))?1:0, password.length()-((password.startsWith("\""))?1:0));
mycnf.put("password", password);
mycnf.put("useOldUTF8Behavior", "true");
mycnf.put("useUnicode", "true");
mycnf.put("characterEncoding", "UTF-8");
mycnf.put("connectionCollation", "utf8_general_ci");
String url = "jdbc:mysql://tools.db.svc.wikimedia.cloud:3306/enwiki_p";
Connection conn = DriverManager.getConnection(url, mycnf);
Node.js
mysql2 client provides a promise-based interface.
const mysql = require('mysql2/promise');
async function sample() {
const connection = await mysql.createConnection({
host: 'tools.db.svc.wikimedia.cloud',
port: 3306,
database: 's12345__mydbname',
user: 's12345',
password: ''
});
const [rows, fields] = await connection.execute('SELECT * FROM table WHERE name = ? AND age > ?', ['Morty', 14]);
for (let row in rows) console.log(row);
}
See also
- Help:Trove database user guide
- Portal:Data Services/Admin/Wiki Replicas
- "New Wiki Replica servers ready for use" (announcement post, September 25, 2017)
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect or the bridged Telegram group
- Discuss via email after you have subscribed to the cloud@ mailing list
- Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
- Read the News wiki page
Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)