Data Platform/Systems/Turnilo
Turnilo provides a friendly user interface to Druid and is used internally at Wikimedia Foundation. As of 2017, most of the data available in Turnilo comes from Hadoop. (See also a snapshot of available data cubes as of April 2017, with update schedules etc.).
Access
To access Turnilo, you need wmf
or nda
LDAP access. For more details, see Analytics/Data access § LDAP access.
If you have that access, you can log in at turnilo.wikimedia.org with your Wikitech username and password.
Administration
Turnilo is currently (2020-02-26) hosted on an-tool1007.eqiad.wmnet
. It is deployed to /srv/deployment/analytics/turnilo/deploy
by scap. Puppet generates its configuration file in /etc/turnilo/config.yaml
using this puppet template: /modules/turnilo/templates/config.yaml.erb
. If any of this is wrong when you're reading it, you can update it fairly quickly by searching the puppet repository for "turnilo".
Restart
sudo systemctl restart turnilo
Logs
Everybody can read /var/log/turnilo/syslog.log
The Analytics team can also use journalctl:
sudo journalctl -u turnilo -f
The -f is needed to keep tailing the logs, otherwise feel free to remove it.
Deploy
Deployment steps for both test and production:
ssh deployment.eqiad.wmnet
cd /srv/deployment/analytics/turnilo/deploy
git pull
For test:
scap deploy --limit an-tool1011.eqiad.wmnet
For production:
scap deploy
The code that renders https://turnilo.wikimedia.org is split in two parts:
- an Apache httpd Virtual Host that takes care of Basic Authentication via LDAP Wikitech credentials check.
- a nodejs application deployed via scap and stored in the https://gerrit.wikimedia.org/r/#/admin/projects/analytics/turnilo/deploy repo.
Test Staging Turnilo
Run ssh -NL 9091:an-tool1011.eqiad.wmnet:9091 an-tool1011.eqiad.wmnet
, then open http://localhost:9091 in a web browser.
Test config changes
NOTE: if you make config changes, you need to test and restart Turnilo once the puppet change is merged (see above).
- Make sure you can ssh to turnilo's box.
- ps -auxfww on box will tell you the command you need to run, something like:
/usr/bin/nodejs /srv/deployment/analytics/turnilo/deploy/node_modules/.bin/turnilo --config /etc/turnilo/config.yaml
- copy yaml file with config to your home directory and change port in which turnilo runs (say you changed it to 9091)
- start a process on box using your local config
- connect via localhost:
ssh -N an-tool1011.eqiad.wmnet -L 9091:localhost:9091
History
Druid is a very useful tool that allows us to very easily load OLAP-shaped big data and query it efficiently. It's much faster than querying through Hive, for example. The initial down side was that users would have needed to learn a new JSON query language to access the data. To solve this problem, at the time, we had three options:
- Pay the folks who develop Saiku to integrate it with Druid (this never got approved in the budget)
- use Caravel (we tried it out but it was buggy and much more complicated than Pivot, more for analysts than PMs). Since then, Caravel was renamed Superset and received considerable development. We are starting to standardize on it for access to our heterogeneous data stores.
- use Pivot, at the time a new open-source tool from Imply.
We chose Pivot, some feedback was gathered in Phabricator. The early impressions were very positive, and over time we added more datasets to Druid and Pivot bringing a lot of value to product managers and execs. As we were doing that, Pivot's source was being closed for legal reasons. The dispute was resolved but Pivot was no longer available under Apache 2.0 license after November 2016. See: announcement for details.
In May 2018, we deployed a new fork of Pivot: Turnilo. While it does not add any new features, it seems well maintained and it is certainly faster.