WMDE/Analytics
Documentation for the WMDE analytics activities.
Puppetization
All of WMDEs puppetized analytics stuff can be found in statistics::wmde and subclasses in the WMF puppet repo.
Currently all scripts run under an 'analytics-wmde' user on the stat boxes. Being in the ‘analytics-wmde-users’ group enables your to have access to the relevant stat box and analytics-wmde user allowing manual triggering of scripts and reading of logs / debugging.
Grafana
grafana.wikimedia.org is a frontend for creating queries and storing dashboards using data from Graphite. Docs for the WMF grafana instance can be found @ Grafana.wikimedia.org.
Our dashboards can be found by looking at our 2 main dashboards:
- Wikidata: https://grafana.wikimedia.org/d/000000154/wikidata
- Technical Wishes: https://grafana.wikimedia.org/d/000000288/team-tcb
There are also some dashboards not connected to these 2 main dashboards.
analytics/wmde/scripts repo
We have a scripts repo on gerrit. This repository contains all of the regularly run cron jobs for generating data that is sent to graphite. Most of the code here is currently written in PHP, efficiency isn’t really needed in the code itself as all of the scripts make web requests or db queries. PHP was chosen as it is the main language for WMDE developers.
This repository has 2 branches (generally kept very up to date with each other):
- master - Development code
- production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)
These scripts currently run on stat1007 (dictated by puppet) using systemd timers and run as the user 'analytics-wmde'.
Code can be found in /srv/analytics-wmde/graphite/src/scripts
.
In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.
Logs
logs by default are only on journald, that keeps them on tmpfs (so basically on ram, they are wiped if we reboot) unless instructed otherwise in puppet (namely the systemd::timer config setting logging etc..)
for the moment journald cannot be accessed by regular users, it needs sudo
If you need access to the logs you can ping folks in #wikimedia-analytics
analytics/wmde/toolkit-analyzer repo and build
toolkit-analyzer
This repository contains Java code used to scan the weekly Wikidata JSON dumps and extract information to be fed into graphite for dashboards. A few one off dump processors & other useful things are also kept here.
toolkit-analyzer-build
This repository simply contains a build of the toolkit-analyzer to be deployed in production. This repository has 2 branches:
- master - Development code
- production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)
The build analyzer runs on stat1007 (dictated by puppet) and runs as the user 'analytics-wmde'. Code can be found in /srv/analytics-wmde. In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.
Ad-hoc Hive queries
Ad-hoc MediaWiki Logging
The WMDE log channel from MediaWiki will be rsynced across to stat boxes.
WDCM
The Wikidata Concepts Monitor (WDCM), a system to track and analyze the Wikidata usage across the Wikimedia projects, is documented on this Wikitech page.