Fundraising Analytics/Impression Stats
We now get stats via Kafka and some special fr-tech-ops script to reconstitute files that look like web logs, then a Django script processes them.
We WANT to switch over to FRUEC
The below seems to all be very outdated.
Banner impressions and landing page stats are collected from Squid logs via udp2log running on Locke. Every 15 minutes a cron job, running via file_mover@locke's crontab, rotates the log files to a local buffer directory where they're retained for 7 days. The script also copies the files via nfs to the local NetApp nas1-a.pmtpa.wmnet, which is mirrored offsite to nas1001-a.eqiad.wmnet. Finally, the NetApps are also nfs-mounted to grosley/aluminium where files are parsed by analytics scripts.
Counting banner impressions is not fun. Currently, we send a beacon to Special:RecordImpression, or /beacon/RecordImpression... which includes GET parameters identifying the banner and campaign, selection criteria, and outcome, telling us whether the banner was hidden or shown.
These Varnish hits are sampled using udp-filter, configured here:
udp2log proxy log collection
udp2log is configured via two entries in locke:/etc/udp2log/squid
:
# Landing pages pipe 1 /a/squid/fundraising/lp-filter >> /a/squid/fundraising/logs/landingpages.log # Banner Impressions pipe 100 /a/squid/fundraising/bi-filter >> /a/squid/fundraising/logs/bannerImpressions-sampled100.log
To enable/disable, uncomment/comment these lines and then HUP udp2log:
awjrichards@locke:~$ /home/file_mover/scripts/resetudp2log
proxy log rotation and archiving
Log rotation, compression, and copy to netapp is handled by a cron job running as user file_mover@locke:
# rotate and compress fundraising banner impression logs, and archive to netapp */15 * * * * /home/file_mover/scripts/rotate_fundraising_logs
analytics processing script
The sampled files are digested by the "Banner impressions loader" Jenkins job, using the following code from the DjangoBannerStats repo. The impression counts are aggregated into the "pgehres" database.
export PYTHONPATH=/etc/fundraising
python /srv/DjangoBannerStats/manage.py LoadLPImpressions --verbose --recent
python /srv/DjangoBannerStats/manage.py LoadBannerImpressions2Aggregate --verbose --top --recent
All this must be changed. The "banner history" feature and deterministic banner loading is meant to replace all of this, some time in 2015.
monitoring and debugging
The cron script logs verbosely and locke:/var/log/syslog will show you actions and errors.
Here's what's what in the following examples:
/a/squid/fundraising/logs/*log # active udp2log collection point /a/squid/fundraising/logs/buffer/2012/*.log # freshly rotated log, before compression /a/squid/fundraising/logs/fr_archive # netapp nfs mount (i.e. permanent archive location)
Under normal operation, you should see this sequence:
Sep 6 17:45:01 locke CRON[28592]: (file_mover) CMD (/home/file_mover/scripts/rotate_fundraising_logs) Sep 6 17:45:01 locke rotate_fundraising_logs[28594]: move /a/squid/fundraising/logs/landingpages.log to /a/squid/fundraising/logs/buffer/2012/landingpages-20120906-174501.log Sep 6 17:45:01 locke rotate_fundraising_logs[28594]: move /a/squid/fundraising/logs/bannerImpressions-sampled100.log to /a/squid/fundraising/logs/buffer/2012/bannerImpressions-sampled100-20120906-174501.log Sep 6 17:45:01 locke rotate_fundraising_logs[28594]: reload udp2log Sep 6 17:45:01 locke rotate_fundraising_logs[28594]: gzip /a/squid/fundraising/logs/buffer/2012/bannerImpressions-sampled100-20120906-174501.log Sep 6 17:45:01 locke rotate_fundraising_logs[28594]: gzip /a/squid/fundraising/logs/buffer/2012/landingpages-20120906-174501.log Sep 6 17:45:01 locke rotate_fundraising_logs[28594]: rsync -ar /a/squid/fundraising/logs/buffer/ /a/squid/fundraising/logs/fr_archive/ Sep 6 17:45:02 locke rotate_fundraising_logs[28594]: done!
Things to watch out for include:
- move/gzip errors due to local partition overrun, permissions snafu
- nfs mount inaccessible
- udp2log HUP fails