Help:Cloud VPS managed monitoring
The Cloud VPS infrastructure contains a managed installation of Prometheus, Prometheus Alertmanager, and Grafana. The current installation primarly exists to support the needs of Cloud Services infrastructure itself, but it might be useful in some limited ways to WMCS users as well.
Available metrics
The managed Prometheus instance scrapes data from a prometheus-node-exporter instance running on all Puppetized Cloud VPS instances.
Scraping other Prometheus-compatible exporters is not properly supported, but it is technically possible for the Cloud VPS admin team to configure additional scrape targets.
The data is kept for 30 days and can be queried via https://prometheus.wmcloud.org (or via Grafana's Explore functionality).
Dashboards
The Grafana instance at https://grafana.wmcloud.org has dashboards based on scraped Prometheus data. In addition that Grafana instance has access to self-managed Prometheus instances in some Cloud VPS projects.
Users in certain privileged developer account groups can create and edit dashboards. Please follow the instructions on the Grafana main page.
Alerts
Alerts for the internal Alertmanager instance can be seen at https://prometheus-alerts.wmcloud.org. Members and readers of a project can set silences for alerts in projects they have access to.
A base set of alerting rules is defined for each project. In addition, Cloud VPS admins can define additional rules for each project. The Cloud VPS admins can also route alerts for a specific project to a list of Libera.Chat IRC channels or email addresses.
The Grafana built-in alerting functionality is not used or supported, although Grafana dashboards can be used to visualize Alertmanager alerts.
See also
- Portal:Cloud VPS/Admin/Metricsinfra — Documentation for Cloud VPS admins for managing this service
- Nova Resource:Metricsinfra/Documentation — Technical documentation about the setup
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect or the bridged Telegram group
- Discuss via email after you have subscribed to the cloud@ mailing list
- Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
- Read the News wiki page
Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)