Jump to content

Portal:Cloud VPS/Admin/Runbooks/PuppetStaleCertificates

From Wikitech
The procedures in this runbook require project admin permissions to complete.

Error / Incident

There's some stale certificates on the puppetmaster after the removal of some VMs.

Common issues

This usually happens after manually removing a VM in a project that has it's own puppetmaster.

If the puppetserver is using the openstack_stale_puppet_certs prometheus exporter (probably yes), then you can cleanup all the obsolete certs with clean-stale-puppet-certs:

tools-puppetserver-01:~$ sudo /usr/local/sbin/clean-stale-puppet-certs --clean


Or, if you're feeling verbose:

root@tools-puppetserver-01:~# for host in $(grep -o 'cert_name="[^"]*' /var/lib/prometheus/node.d/openstack_stale_puppet_certs.prom  | cut -d'"' -f2); do ping -c1 -w1 "$host" && { echo "SKIPPING: $host is alive"; continue; }; puppetserver ca clean --certname "$host"; done

Another way to try to find what to cleanup, is checking the fqdns that were removed in the following graph (search for relevant project in the stale certificates graph): https://grafana-rw.wmcloud.org/d/SQM7MJZSz

then with the list of failed fnqdns, follow the guideline here: https://wikitech.wikimedia.org/wiki/Puppet#node_cleanup

To avoid, you can use the dedicated cookbook to remove the instances:

$ sudo cookbook wmcs.vps.remove_instance --help
usage: cookbook [GLOBAL_ARGS] wmcs.vps.remove_instance [-h]
                                                       [--project PROJECT]
                                                       [--task-id TASK_ID]
                                                       [--no-dologmsg]
                                                       [--cluster-name {eqiad1,codfw1dev}]
                                                       --server-name
                                                       SERVER_NAME

WMCS Toolforge - Remove an instance from a project.

Usage example:
    cookbook wmcs.vps.remove_instance \
        --project toolsbeta \
        --server-name toolsbeta-k8s-test-etcd-08

optional arguments:
  -h, --help            show this help message and exit
  --project PROJECT     Relevant Cloud VPS openstack project (for operations,
                        dologmsg, etc). If this cookbook is for hardware, this
                        only affects dologmsg calls. Default is 'admin'.
  --task-id TASK_ID     Id of the task related to this operation (ex.
                        T123456). (default: None)
  --no-dologmsg         To disable dologmsg calls (no SAL messages on IRC).
                        (default: False)
  --cluster-name {eqiad1,codfw1dev}
                        Openstack cluster_name where the VM is hosted.
                        (default: eqiad1)
  --server-name SERVER_NAME
                        Name of the server to remove (without domain, ex.
                        toolsbeta-test-k8s-etcd-9). (default: None)


Support contacts

Communication and support

Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:

Discuss and receive general support
Stay aware of critical changes and plans
Track work tasks and report bugs

Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself

Read stories and WMCS blog posts

Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)

Old incidents

Add your incident here: