Jump to content

Data Platform/Systems/Turnilo

From Wikitech

Turnilo provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Turnilo comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.).

Access

To access Turnilo, you need wmf or nda LDAP access. For more details, see Analytics/Data access § LDAP access.

If you have that access, you can log in at turnilo.wikimedia.org with your Wikitech username and password.

Administration

Turnilo is currently (2020-02-26) hosted on an-tool1007.eqiad.wmnet. It is deployed to /srv/deployment/analytics/turnilo/deploy by scap. Puppet generates its configuration file in /etc/turnilo/config.yaml using this puppet template: /modules/turnilo/templates/config.yaml.erb. If any of this is wrong when you're reading it, you can update it fairly quickly by searching the puppet repository for "turnilo".

Restart

sudo systemctl restart turnilo

Logs

Everybody can read /var/log/turnilo/syslog.log

The Analytics team can also use journalctl:

sudo journalctl -u turnilo -f

The -f is needed to keep tailing the logs, otherwise feel free to remove it.

Deploy

Deployment steps for both test and production:

ssh deployment.eqiad.wmnet

cd /srv/deployment/analytics/turnilo/deploy

git pull

For test: scap deploy --limit an-tool1011.eqiad.wmnet

For production: scap deploy

The code that renders https://turnilo.wikimedia.org is split in two parts:

Test Staging Turnilo

Run ssh -NL 9091:an-tool1011.eqiad.wmnet:9091 an-tool1011.eqiad.wmnet, then open http://localhost:9091 in a web browser.

Test config changes

NOTE: if you make config changes, you need to test and restart Turnilo once the puppet change is merged (see above).

  • Make sure you can ssh to turnilo's box.
  • ps -auxfww on box will tell you the command you need to run, something like:
 /usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config /etc/turnilo/config.yaml
  • copy yaml file with config to your home directory and change port in which turnilo runs (say you changed it to 9091)
  • start a process on box using your local config
  • connect via localhost: ssh -N an-tool1011.eqiad.wmnet -L 9091:localhost:9091

History

Druid is a very useful tool that allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would have needed to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:

  • Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
  • use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs). Since then, Caravel was renamed Superset and received considerable development. We are starting to standardize on it for access to our heterogeneous data stores.
  • use Pivot, at the time a new open-source tool from Imply.

We chose Pivot, some feedback was gathered in Phabricator. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: announcement for details.

In May 2018, we deployed a new fork of Pivot: Turnilo. While it does not add any new features, it seems well maintained and it is certainly faster.