Jump to content

GitLab/Runbook

From Wikitech

A runbook is a set of instructions for a human what to do. More specifically what to do when a certain monitoring alert triggers. This page contains GitLab related runbooks, linked from alertmanager rules with the runbook annotation.

GitLabCIJobErrors

More than 50% of GitLab CI jobs are failing.

Check what kind of job error is elevated https://grafana.wikimedia.org/d/Chb-gC07k/gitlab-ci-overview?orgId=1.

Also check the past and currently running Jobs at https://gitlab.wikimedia.org/admin/jobs (admin privileges required).

GitLabCIPipelineErrors

More than 50% of GitLab CI pipelines are failing.

Check what kind of pipeline error is elevated https://grafana.wikimedia.org/d/Chb-gC07k/gitlab-ci-overview?orgId=1.

Also check the past and currently running Jobs at https://gitlab.wikimedia.org/admin/jobs (admin privileges required).

GitLabCIPipelineLatency

Pipeline createn takes more than 10 seconds.

Check number of active pipelines at https://grafana.wikimedia.org/d/Chb-gC07k/gitlab-ci-overview?orgId=1.

Also check the past and currently running Jobs at https://gitlab.wikimedia.org/admin/jobs (admin privileges required).

Check if all runners are available and marked online at https://gitlab.wikimedia.org/admin/runners.

If needed, stop jobs which need too much resources.

GitLabRunnerTrustedConfigMissing

The configuration for the Trusted runners is missing a configuration setting. The runners have to be "locked", "protected" and must not run untagged jobs to ensure proper separation of trusted/reviewed jobs and unreviewed jobs. Sometimes this settings get lost by administrators accidentally removing them or by bugs/version upgrades.

This settings are handled by the gitlab-trusted-runner/ project. To re-apply the correct configuration, run a new pipeline, the runner-config job should re-apply the correct settings. (Note, the job runs every 30 minutes). Alternatively the edit button in the admin interface can be used to edit the settings directly. The correct settings are:

  • run untagged jobs: false/unchecked
  • protected: true/checked
  • lock to current project: true/checked