HAProxyKafka
HAProxyKafka is a daemon running on all cp hosts to read logs produced by HAProxy and forward them to Kafka to be used in analytics pipelines.
HaproxyKafka replaced VarnishKafka to have a better observability of incoming requests, considering that HAProxy is our main entry point for all CDN requests.
The daemon is written in Golang and uses the librdkafka
library installed on the cache hosts (and NOT the embedded one provided by the Golang package).
Useful links:
- Code repository on GitLab
- Grafana Dashboard
Building the package and deploy new version
GitLab CI should build the package just fine using the usual branch naming to choose the target distribution. Upload the binary package as usual to the internal APT repo and deploy it manually (or with something like cumin) as usual.
HAProxy configuration
To let HAProxyKafka correctly parse and dispatch messages, HAProxy must be configured:
1. To log to a unix domain socket (created by HAProxyKafka in advance with the correct permissions) 2. To structure logs with the expected log-format. This includes the captured headers in the correct order, in RFC5424 format.
Luckily using Puppet this is configured automatically, along with the HAProxyKafka configuration file and services. Refer to existing puppet hiera/code for this.
Procedures
Restarting
Restarting HaproxyKafka requires a bried amount of time so the HAProxy log buffer should be sufficient, under normal conditions, to avoid losing messages. If HaproxyKafka is unavailable for quite some time, on the other hand, HAProxy messages will be silently discarded and lost.