Analytics/Archive/Data/Webrequests sampled
The Requests stream is holding request logs from all caches.
This stream is owned by the Analytics Team.
Availability
NOTES
- As of 2015-11, webrequest udp2log instances have been turned off. The /a/squid/archive data is no longer generated.
- By December 2016, all data in /a/squid/archive had been removed.
stat1002.eqiad.wmnet /a/squid/archive/sampled
The stream is available in Cache log format with a sampling rate of 1:1000 as gzipped files at /a/squid/archive/sampled/sampled-1000.tsv.log-*.gz
on stat1002.eqiad.wmnet
(using udp2log as backend).
The date in the file name does not mean that all logs of that day are in that file. Instead, the files contain logs from ~06:30 of the previous day until to ~06:30 of the day in the file name. So for example /a/squid/archive/sampled/sampled-1000.tsv.log-20130930.gz
contains data from ~2013-09-29T06:30.00 until ~2013-09-30T06:30:00.
Avg. size / gzipped file | 714 MiB |
---|---|
Avg. size / uncompressed file | 3502 MiB |
Avg. lines / uncompressed file | 8782 K |
Avg. lines / second | 102 |
Avg. requests / second | 102 K |
This stream gets used for:
- adhoc research
Events and known problems since 2013-09-01
Date from | Date until | Bug | Details |
---|---|---|---|
Inherent | * | The stream may suffer from packet drop on udp2log. This should be <5%. | |
* | âzeroâ markers got set not only for wikipedia, but also for sister projects (wiktionary, ...) | ||
* | Lines that would be longer than ~8K get chopped off at that border (no newline gets added). (Affects <1 line/day on average) | ||
* | bug 60315 | The stream does not contain the SSL requests that come to the SSL terminators, but only forwarded ones from the terminators. | |
* | 2013-09-26 | bug 53806 | Until around 2013-09-26 ~22:57, traffic from the mobile varnishes might have been coming with a garbled client ip. |
2013-09-26 | 2013-10-01 | bug 54779 | No âmf-mâ markers in stream between 2013-09-26 ~22:32 and 2013-10-01 ~14:30. |
2013-12-17 | n/a | Squids got phased out (Last entry in Squid log format is on 2013-12-17T15:45:32.764) | |
2013-12-18 | n/a | bug 58889 | Increase in zero=470-01 (Grameenphone Bangladesh) tagged traffic, due to the advertisement by the carrier |
2014-02-05 | 2014-02-25 | bug 60955 | The gz files with filenames 20140206, 20140208 on stat1002 were missing/bad between 2014-02-06, and 2014-02-20.
The file 20140207 had extra data from 2014-02-05 and 2014-02-06 that has been removed on 2014-02-22. |
2014-03-21 | 2014-03-21 | bug 62922 | Sometimes zero tags are doubled like âzero=250-99;zero=250-99â. The first occurrence is on 2014-03-21T00:23:13. Last occurrence is on 2014-03-21T17:07:35. |
2014-05-22 | 2014-06-24 | bug 66833 | Zero tags need not have a trailing characters stripped (like âzero=404-01bâ instead of âzero=404-01â). Last occurrence is on 2014-06-24T14:18:24. |
2014-07-09 09:00 | 2014-07-10 09:00 | bug 68199 | Traffic has been rerouted from ulsfo to eqiad for ULSFO floor move. No data has been lost, but host column may show eqiad caches for traffic that could be expected to go to ulsfo. |
2014-07-25 ~14:00 | 2014-07-25 ~17:00 | bug 69112 | Carrier 250-99 was not properly zero tagged. |
2014-07-29 01:35 | 2014-07-29 01:42 | bug 68796 | All of esams missing between 2014-07-29T01:35:45 and 2014-07-29T01:42:00 due to flapping network link (~11% of total zero traffic around that time) |
2014-07-30 ~00:54 | 2014-08-04 ~21:00 | bug 69112 | Carrier 250-99 was not properly zero tagged, and some of the carrier's requests came with zero=ON instead. |
2014-10-08 22:20 | 2014-10-08 23:20 | bug 71879 | ULSFO having connectivity issues leading to partial message loss |
2014-10-20 13:06 | 2014-10-20 13:27 | bug 72306 | ULSFO connectivity issues causing packet loss between 6% and 47% for ulsfo caches. |
2014-10-21 ~10:30 | 2014-10-21 ~11:43 | bug 72355 | Ulsfo connectivity issues causing packet loss for ulsfo caches. |
2014-11-30 ~03:50 | 2014-11-30 ~10:13 | task T76334 | No data while analytics infrastructure suffered eqiad network issues. |
2015-01-13 ~22:20 | 2015-01-13 ~23:18 | task T86973 | No data due to firewall problems |
stat1002.eqiad.wmnet /a/log/webrequest/archive/sampled
The stream is available in Cache log format with a sampling rate of 1:1000 as gzipped files at /a/log/webrequest/archive/sampled/sampled-1000.tsv.log-*.gz
on stat1002.eqiad.wmnet
(using kafka as backend).
Each file covers the full day of the date in the file name.
Events and known problems since 2015-01-01
Date from | Date until | Bug | Details |
---|