RIPE Atlas
The RIPE Atlas is a distributed network monitoring project to measure reachability and latency. There are two device types, Probes and Anchors. Probes are small, USB-powered appliances, while Anchors are 1U rack mounted equipment. Probes and Anchors test connectivity to remote Anchors and DNS root servers, and report their results to the Atlas website. WMF hosts 4 Anchors.
- https://atlas.ripe.net/
- https://atlas.ripe.net/results/maps/network-coverage/?filter=14907
- eqiad: https://atlas.ripe.net/probes/6092/
- codfw: https://atlas.ripe.net/probes/6093/
- esams: https://atlas.ripe.net/probes/7261/ (VM)
- eqsin: http://atlas.ripe.net/probes/6345/
The Atlas has been used to measure things like AAAA filtering, DNS root server reachability, and Internet routing response to hurricanes: https://atlas.ripe.net/results/analyses/
In addition to the stats you can get from RIPE's site, we track some statistics of our own: https://grafana.wikimedia.org/d/K1qm1j-Wz/ripe-atlas?orgId=1
Run tests from the command line
Atlas has a suite of command line tools to interact with its API. On "cluster management" production hosts (cumin1002.eqiad.wmnet, cumin2002.codfw.wmnet
) SRE has the tools installed and they can be accessed when running as the 'atlas' user. All tools are also aliased with the correct sudo invocation for convenience, for example running a ssl certificate test from 99 italian probes:
cumin1001:~$ source /etc/ripeatlas.alias # load sudo aliases
cumin1001:~$ asslcert --target text-lb.esams.wikimedia.org --from-country it --probes 99 --no-report
Looking good! Your measurement was created and details about it can be found here:
https://atlas.ripe.net/measurements/22900971/
cumin1001:~$
Country latency measurement
latency-measurement can be used to automate the measurement of latency of each country to the various WMF servers.
Anchor setup
RIPE NCC doc: https://atlas.ripe.net/docs/howtos/installing-vm-anchor.html
Tracked in: https://phabricator.wikimedia.org/T307021
- If not already present, add the sandbox vlan to Netbox, the switch/router, the hypervisors and Puppet (network/data/data.yaml). See other sites like esams for example config.
- On a Cumin host: create the VM with those parameters:
sudo cookbook sre.ganeti.makevm --vcpus 2 --memory 2 --disk 50 --network sandbox --os none --cluster XXX --group YYY atlasZZZZ
- In Netbox, edit the newly created VM and set its tenant to "RIPE NCC".
- On the primary hypervisor: Enable SPICE for that VM
- On the primary hypervisor: Start the VM: sudo gnt-instance start -H boot_order=cdrom,cdrom_image_path=/tmp/anchor.iso atlasZZZZ.wikimedia.org
- Connect to the hypervisor indicated by the error message (eg. "Hypervisor parameter validation failed on node ganeti3007.esams.wmnet") and download the Atlas image:
https_proxy=http://webproxy:8080 wget https://ks.atlas.ripe.net/misc/anchor.iso -O /tmp/anchor.iso
- Setup the SPICE port forwarding
- Start the VM for real this time with the above command.
- Quickly (as there is a grub timeout) in the SPICE window, select the manual networking config
- Setup the IP config based on the IPs allocated to the VM in Netbox. NOTE THAT THE BACKSPACE KEY DOESN'T WORK SO GOOD LUCK
- Once the installer is running set the boot order back to disk:
sudo gnt-instance modify --hypervisor-parameters=boot_order=disk atlasZZZZ.wikimedia.org
- Delete the previously downloaded image (
rm /tmp/anchor.iso
) - Run the https://netbox.wikimedia.org/extras/scripts/capirca.GetHosts/ Netbox script
- Run Homer on the core routers