Analytics/Archive/Data/Zero webrequests
The Zero requests stream is holding request logs from the mobile caches that got tagged as "zero".
This stream is owned by the Analytics Team.
Contained data
The contained data only covers requests from mobile caches (hence only a part of all mobile traffic, and it does not cover traffic to for example api, bits, upload subdomains), that come from an IP that is associated[1] to a zero carrier and has the zero= marker set.
Availability
stat1002.eqiad.wmnet /a/squid/archive/zero
The stream is available unsampled in Cache log format as gzipped files at /a/squid/archive/zero/zero.tsv.log-*.gz
on stat1002.eqiad.wmnet
.
The date in the file name does not mean that all logs of that day are in that file. Instead, the files contain logs from ~06:30 of the previous day until to ~06:30 of the day in the file name. So for example /a/squid/archive/zero/zero.tsv.log-20130930.gz
contains data from ~2013-09-29T06:30.00 until ~2013-09-30T06:30:00.
Avg. size / gzipped file | 183 MiB |
---|---|
Avg. size / uncompressed file | 543 MiB |
Avg. lines / uncompressed file | 2774 K |
Avg. lines / second | 32 |
Avg. requests / second | 32 |
This stream gets used for:
- adhoc research
Events and known problems since 2013-09-01
Date from | Date until | Bug | Details |
---|---|---|---|
Inherent | Only covers data from mobile caches, not all mobile traffic. | ||
Inherent | The stream may suffer from packet drop on udp2log. This should be <5%. | ||
* | Lines that would be longer than ~8K get chopped off at that border (no newline gets added). (Affects <1 line/day on average) | ||
* | âzeroâ markers got set not only for wikipedia, but also for sister projects (wiktionary, ...) | ||
* | 2013-09-26 | bug 53806 | Until around 2013-09-26 ~22:57, the client ip might have been garbled. |
2013-09-26 | 2013-10-01 | bug 54779 | No âmf-mâ markers in stream between 2013-09-26 ~22:56 and 2013-10-01 ~13:32. |
2013-11-13 | 2013-12-16 | bug 58764 | <40 lines/day have been concatenated due to puppet runs unnecessarily restarting the udp2log filtering. First occurrence on 2013-11-13T17:29:22. Last occurrence on 2013-12-16T16:29:20. |
2013-12-18 | n/a | bug 58889 | Increase in zero=470-01 (Grameenphone Bangladesh) tagged traffic, due to the advertisement by the carrier |
2014-01-05 | 2014-01-06 | bug 59722 | Udp2log relay went down. There are no log lines between 2014-01-05T03:39:25 and 2014-01-06T17:45:10. |
2014-03-21 | 2014-03-21 | bug 62922 | Sometimes zero tags are doubled like âzero=250-99;zero=250-99â. The first occurrence is on 2014-03-21T00:15:41. Last occurrence is on 2014-03-21T17:18:02. |
2014-05-22 | 2014-06-24 | bug 66833 | Zero tags need not have a trailing characters stripped (like âzero=404-01bâ instead of âzero=404-01â). Last occurrence is on 2014-06-24T15:29:30. |
2014-07-25 ~14:00 | 2014-07-25 ~17:00 | bug 69112 | A big part of carrier 250-99 requests were not properly zero tagged, and hence are missing from this stream. |
2014-07-08 19:00 | 2014-07-08 22:00 | bug 67694 | A 2014 FIFA World Cup (soccer) related traffic spike caused udp2log overload and lead to up to ~10% packetloss during this period of time. |
2014-07-09 09:00 | 2014-07-10 09:00 | bug 68199 | Traffic has been rerouted from ulsfo to eqiad for ULSFO floor move. No data has been lost, but host column may show eqiad caches for traffic that could be expected to go to ulsfo. |
2014-07-13 19:00 | 2014-07-13 23:00 | bug 67694 | A 2014 FIFA World Cup (soccer) related traffic spike caused udp2log overload and lead to up to ~25% packetloss during this period of time. |
2014-07-29 01:35 | 2014-07-29 01:42 | bug 68796 | cp3013, cp3014 (half of esams) missing between 2014-07-29T01:35:45 and 2014-07-29T01:42:00 due to flapping network link (~15% of total mobile traffic around that time) |
2014-07-30 ~00:54 | 2014-08-04 ~21:00 | bug 69112 | A big part of carrier 250-99 requests were not properly zero tagged, and hence are missing from this stream. |
2014-08-16 ~22:43 | 2014-08-16 ~22:49 | bug 69663 | Root mount on oxygen went full, which caused services to panic and udp2log dropped requests during that time |
2014-08-17 ~06:26 | 2014-08-17 ~06:30 | bug 69663 | Root mount on oxygen went full again, which caused services to panic and udp2log dropped requests during that time |
2014-10-08 ~22:00 | 2014-10-08 ~24:00 | bug 71879 | ULSFO having connectivity issues leading to partial message loss |
2014-10-20 13:06 | 2014-10-20 13:27 | bug 72306 | ULSFO connectivity issues causing packet loss between 6% and 47% for ulsfo caches. |
2014-10-21 ~10:30 | 2014-10-21 ~11:43 | bug 72355 | Ulsfo connectivity issues causing packet loss for ulsfo caches. |
2014-11-30 ~03:50 | 2014-11-30 ~10:13 | task T76334 | No data while analytics infrastructure suffered eqiad network issues. |
2015-01-13 ~22:20 | 2015-01-13 ~23:18 | task T86973 | No data due to firewall problems |
stat1002.eqiad.wmnet /a/log/webrequest/archive/zero
The stream is available in Cache log format unsampled as gzipped files at /a/log/webrequest/archive/zero/zero.tsv.log-*.gz
on stat1002.eqiad.wmnet
(using kafka as backend).
Each file covers the full day of the date in the file name.
Events and known problems since 2015-01-01
Date from | Date until | Bug | Details |
---|
Note
- â See the Zero namespace of the zero wiki. For example Zero:404-01 for the carrier
404-01
.