Portal:Cloud VPS/Admin/Runbooks/NovaFullstackStaleStats
Error / Incident
Some of the stats used for the novafullstack alerts were not reported for some time.
Debugging
Check that the prometheus series are there, search for the alert name in the alerts.git repo, and look for the expr line, would be something like:
expr: count(cloudvps_novafullstack_instances_count) == 0 or count(cloudvps_novafullstack_instances_max) == 0
From there each of the stats inside a count
function might be the one failing (or all of them!), so you can go to thanos or grafana and query them there.
This might mean:
- That the novafullstack service is misbehaving
- That the stats names did change
- That the service is down
- That the prometheus stats are not being generated (usually under
/var/lib/prometheus/node.d/nofafullstack.prom
in the cloudcontrol that is running the novafullstack service).
Common issues
Add any new issues you find here.
Related information
Support contacts
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect or the bridged Telegram group
- Discuss via email after you have subscribed to the cloud@ mailing list
- Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
- Read the News wiki page
Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)
Old incidents
Add any tasks for incidents related to this alert here.