Liberica
Liberica is the next-generation load balancer replacing PyBal. It is designed with a modular approach and written in Go. The source code is available on GitLab.
Daemons
Control Plane
The control plane (cp) daemon orchestrates the rest of the Liberica daemons. It uses a configuration file to define a list of services and relies on etcd for real server discovery. Communication with other Liberica daemons and gobgpd occurs over gRPC.
Forwarding Plane
The forwarding plane (fp) daemon exposes the following forwarding planes to Liberica’s control plane via its gRPC API:
- IPVS
- Katran
Healthcheck Forwarder
The healthcheck forwarder (hcforwarder) daemon forwards health check traffic to real servers in the same way production traffic reaches them (using IPIP encapsulation). It consists of two components: a Go daemon that exposes a gRPC API and Prometheus metrics, and an eBPF program that handles the network traffic. The healthcheck daemon targets a specific real server by setting a SOMARK. The hcforwarder uses this SOMARK to identify the real server's IP and perform IPIP encapsulation.
Healthcheck
The Liberica healthcheck daemon performs health checks and tracks their results. It notifies subscribers (usually the control plane daemon) of any changes in the state of real servers. Additionally, it exposes health check results as Prometheus metrics.
Operating Liberica
Liberica provides a CLI tool called liberica
for fetching the current state of its various daemons. This tool uses the same gRPC API employed by the Liberica control plane daemon to gather insights from different components.
Pooling a liberica instance
liberica control plane will automatically pool the load balancer when the control plane is started, it can be done manually using systemctl:
systemctl start liberica-cp.service
On a succesful start eventually liberica should configure the BGP paths for all the configured services, the journal log should contain something like this:
time=2025-01-28T14:03:20.036Z level=INFO msg="bootstrapping completed, configuring BGP path" service=ncredirlb6_80
time=2025-01-28T14:03:20.037Z level=INFO msg="bootstrapping completed, configuring BGP path" service=ncredir-httpslb_443
time=2025-01-28T14:03:20.037Z level=INFO msg="bootstrapping completed, configuring BGP path" service=ncredirlb_80
time=2025-01-28T14:03:20.037Z level=INFO msg="bootstrapping completed, configuring BGP path" service=ncredir-httpslb6_443
Additionally BGP status can be checked using gobgp CLI:
$ gobgp neighbor # list BGP neighbors
Peer AS Up/Down State |#Received Accepted
10.64.130.1 64810 00:26:03 Establ | 0 0
$ gobgp neighbor 10.64.130.1 adj-out # check exported IPv4 prefixes
ID Network Next Hop AS_PATH Attrs
4 208.80.154.232/32 10.64.130.16 64600 [{Origin: i} {Communities: 14907:11}]
$ gobgp neighbor 10.64.130.1 adj-out -a ipv6 # check exported IPv6 prefixes
ID Network Next Hop AS_PATH Attrs
4 2620:0:861:ed1a::9/128 2620:0:861:109:10:64:130:16 64600 [{Origin: i} {Communities: 14907:11}]
Depooling a liberica instance
Depooling a liberica based load balancer should be as easy as stopping the control plane using the following command:
systemctl stop liberica-cp.service
After a successful depool gobgp should show an empty list of neighbors, this can be verified using gobgp CLI:
$ sudo -i gobgp neighbor
Peer AS Up/Down State |#Received Accepted
Alerts
LibericaDiffFPCheck
This alert is triggered after finding a mismatch between realservers that should be effectively pooled according to the control plane and the ones that are actually pooled on the forwarding plane. Both can be queried using the liberica CLI tool:
vgutierrez@lvs4009:~$ liberica cp services # control plane status
upload-httpslb6_443:
2620:0:863:101:10:128:0:12 1 healthy: true | pooled: yes
2620:0:863:101:10:128:0:14 1 healthy: true | pooled: yes
2620:0:863:101:10:128:0:35 1 healthy: true | pooled: yes
2620:0:863:101:10:128:0:21 1 healthy: true | pooled: yes
2620:0:863:101:10:128:0:36 1 healthy: true | pooled: yes
2620:0:863:101:10:128:0:24 1 healthy: true | pooled: yes
2620:0:863:101:10:128:0:10 1 healthy: true | pooled: yes
2620:0:863:101:10:128:0:37 1 healthy: true | pooled: yes
[...]
vgutierrez@lvs4009:~$ liberica fp services # forwarding plane status
[...]
2620:0:863:ed1a::2:b:443 mh
2620:0:863:101:10:128:0:37 1
2620:0:863:101:10:128:0:10 1
2620:0:863:101:10:128:0:24 1
2620:0:863:101:10:128:0:36 1
2620:0:863:101:10:128:0:21 1
2620:0:863:101:10:128:0:35 1
2620:0:863:101:10:128:0:14 1
2620:0:863:101:10:128:0:12 1
This alert should be an indicator of a bug or a transient issue in Liberica, the easiest way of fixing it should be restarting liberica using the sre.loadbalancer.upgrade cookbook from a cumin host:
$ sudo cookbook sre.loadbalancer.upgrade --query "P{lvs4009.ulsfo.wmnet}" --reason "clear LibericaDiffFPCheck alert" restart
LibericaStaleConfig
This alert is triggered if a configuration deployed by puppet hasn't been loaded in the next hour. This can be fixed reloading Liberica configuration using the sre.loadbalancer.admin cookbook from a cumin host:
$ sudo cookbook sre.loadbalancer.admin --query "P{lvs4009.ulsfo.wmnet}" --reason "clear LibericStaleConfig alert" config_reload