Jump to content

Help:Cloud VPS managed monitoring

From Wikitech

The Cloud VPS infrastructure contains a managed installation of Prometheus, Prometheus Alertmanager, and Grafana. The current installation primarly exists to support the needs of Cloud Services infrastructure itself, but it might be useful in some limited ways to WMCS users as well.

Available metrics

The managed Prometheus instance scrapes data from a prometheus-node-exporter instance running on all Puppetized Cloud VPS instances.

Scraping other Prometheus-compatible exporters is not properly supported, but it is technically possible for the Cloud VPS admin team to configure additional scrape targets.

The data is kept for 30 days and can be queried via https://prometheus.wmcloud.org (or via Grafana's Explore functionality).

Dashboards

The Grafana instance at https://grafana.wmcloud.org has dashboards based on scraped Prometheus data. In addition that Grafana instance has access to self-managed Prometheus instances in some Cloud VPS projects.

Users in certain privileged developer account groups can create and edit dashboards. Please follow the instructions on the Grafana main page.

Alerts

Alerts for the internal Alertmanager instance can be seen at https://prometheus-alerts.wmcloud.org. Members and readers of a project can set silences for alerts in projects they have access to.

A base set of alerting rules is defined for each project. In addition, Cloud VPS admins can define additional rules for each project. The Cloud VPS admins can also route alerts for a specific project to a list of Libera.Chat IRC channels or email addresses.

The Grafana built-in alerting functionality is not used or supported, although Grafana dashboards can be used to visualize Alertmanager alerts.

See also

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)