Jump to content

HAProxyKafka

From Wikitech

HAProxyKafka is a daemon running on all cp hosts to read logs produced by HAProxy and forward them to Kafka to be used in analytics pipelines.

HaproxyKafka replaced VarnishKafka to have a better observability of incoming requests, considering that HAProxy is our main entry point for all CDN requests.

The daemon is written in Golang and uses the librdkafka library installed on the cache hosts (and NOT the embedded one provided by the Golang package).

Useful links:

Building the package and deploy new version

GitLab CI should build the package just fine using the usual branch naming to choose the target distribution. Upload the binary package as usual to the internal APT repo and deploy it manually (or with something like cumin) as usual.

HAProxy configuration

To let HAProxyKafka correctly parse and dispatch messages, HAProxy must be configured:

1. To log to a unix domain socket (created by HAProxyKafka in advance with the correct permissions) 2. To structure logs with the expected log-format. This includes the captured headers in the correct order, in RFC5424 format.

Luckily using Puppet this is configured automatically, along with the HAProxyKafka configuration file and services. Refer to existing puppet hiera/code for this.

Procedures

Restarting

Restarting HaproxyKafka requires a bried amount of time so the HAProxy log buffer should be sufficient, under normal conditions, to avoid losing messages. If HaproxyKafka is unavailable for quite some time, on the other hand, HAProxy messages will be silently discarded and lost.

See also