Jump to content

WMDE/Analytics

From Wikitech

Documentation for the WMDE analytics activities.

Puppetization

All of WMDEs puppetized analytics stuff can be found in statistics::wmde and subclasses in the WMF puppet repo.

Currently all scripts run under an 'analytics-wmde' user on the stat boxes. Being in the ‘analytics-wmde-users’ group enables your to have access to the relevant stat box and analytics-wmde user allowing manual triggering of scripts and reading of logs / debugging.

Grafana

grafana.wikimedia.org is a frontend for creating queries and storing dashboards using data from Graphite. Docs for the WMF grafana instance can be found @ Grafana.wikimedia.org.

Our dashboards can be found by looking at our 2 main dashboards:

There are also some dashboards not connected to these 2 main dashboards.

analytics/wmde/scripts repo

We have a scripts repo on gerrit. This repository contains all of the regularly run cron jobs for generating data that is sent to graphite. Most of the code here is currently written in PHP, efficiency isn’t really needed in the code itself as all of the scripts make web requests or db queries. PHP was chosen as it is the main language for WMDE developers.

This repository has 2 branches (generally kept very up to date with each other):

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)

These scripts currently run on stat1007 (dictated by puppet) using systemd timers and run as the user 'analytics-wmde'. Code can be found in /srv/analytics-wmde/graphite/src/scripts. In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.

Logs

logs by default are only on journald, that keeps them on tmpfs (so basically on ram, they are wiped if we reboot) unless instructed otherwise in puppet (namely the systemd::timer config setting logging etc..)

for the moment journald cannot be accessed by regular users, it needs sudo

If you need access to the logs you can ping folks in #wikimedia-analytics

analytics/wmde/toolkit-analyzer repo and build

toolkit-analyzer

This repository contains Java code used to scan the weekly Wikidata JSON dumps and extract information to be fed into graphite for dashboards. A few one off dump processors & other useful things are also kept here.

toolkit-analyzer-build

This repository simply contains a build of the toolkit-analyzer to be deployed in production. This repository has 2 branches:

  • master - Development code
  • production - Deployed code (merges here will trigger a deploy by puppet, only a few people have access to +2 on the branch for that reason. You may need to request access.)

The build analyzer runs on stat1007 (dictated by puppet) and runs as the user 'analytics-wmde'. Code can be found in /srv/analytics-wmde. In order to SUDO as this user you will need to be in the analytics-wmde-users LDAP group.

Ad-hoc Hive queries

Ad-hoc MediaWiki Logging

The WMDE log channel from MediaWiki will be rsynced across to stat boxes.

WDCM

The Wikidata Concepts Monitor (WDCM), a system to track and analyze the Wikidata usage across the Wikimedia projects, is documented on this Wikitech page.

WDCM System Operation Workflow.