Acme-chief/Cloud VPS setup
This doc is based off https://phabricator.wikimedia.org/T235252#5567838
Introduction
Acme-chief is Wikimedia's tool to integrate Let's Encrypt certificates into our puppetised services. It was originally developed by Alex Monk at the Wikimedia Hackathon in Barcelona in May 2018 based on discussions with the Wikimedia Traffic team, and basically solves these problems:
- Issue our certificates centrally, distributing private key material and certificates as appropriate.
- Use DNS to respond to LE challenges, enabling use of wildcards.
- Integrate with gdnsd (production) and OpenStack Designate (Cloud VPS) to do this.
- Generate RSA and ECDSA variants of the same certificates.
- Expose new certificates after a staging time to allow for outdated client clocks.
- Probably other things more relevant to production than us.
In production there's a single, simple, central setup that gets used for everything from the *.wikipedia.org 'unified' cert exposed to some clients hitting wikipedia.org and co., down to miscellaneous such as developer services on gerrit and SMTP servers. In Cloud VPS with our only-trustworthy-for-public-things central puppetmaster and multiple private Puppet setups, it gets more complicated.
Existing examples of acme-chief in Cloud VPS
This is known to be active within deployment-prep
, traffic
, and tools
(with toolsbeta
being set up).
Setting it up for your own Cloud VPS project
You will need
- a project-local puppetmaster (i.e. your own puppetmaster, not just relying on the central one at puppetmaster.cloudinfra.wmflabs.org)
- a friendly production root to give your DNS management service user special permissions in Keystone and safelist it for access from within the Cloud VPS address range
- the domain for which you want to issue certs as a zone in Designate, with delegation set up correctly
Full steps
- Create an instance named <project>-acme-chief-01 and do the usual dance with puppet to get them signed puppet certs.
- Create a puppet prefix config in Horizon for
<project>-acme-chief
with the following hiera as a template (obviously, substitute<project>
):
profile::acme_chief::accounts: {}
profile::acme_chief::active: <project>-acme-chief-01.<project>.eqiad1.wikimedia.cloud
profile::acme_chief::passive: ''
profile::acme_chief::certificates: {}
shared_acme_certificates: {}
profile::acme_chief::challenges:
dns-01:
issuing_ca: letsencrypt.org
ns_records:
- ns0.openstack.eqiad1.wikimediacloud.org.
- ns1.openstack.eqiad1.wikimediacloud.org.
resolver_port: 53
sync_dns_servers:
- ignored_for_designate.
zone_update_cmd: /usr/local/bin/acme-chief-designate-sync.py
profile::acme_chief::cloud::designate_sync_auth_url: https://openstack.eqiad1.wikimediacloud.org:25000/v3
profile::acme_chief::cloud::designate_sync_project_names: [<project>]
profile::acme_chief::cloud::designate_sync_region_name: eqiad1-r
profile::acme_chief::cloud::designate_sync_tidyup_enabled: true
profile::acme_chief::cloud::designate_sync_username: <project>-dns-manager
- Insert <project>-dns-manager password into puppet through a cherry-pick on your puppetmaster by adding
profile::acme_chief::cloud::designate_sync_password
to hieradata/common.yaml in labs/private. - Apply the
role::acme_chief::cloud
role on the instance individually (in my experience roles in prefix/project config can be problematic) and run puppet. - Run the account creation script /usr/local/bin/create_acme_le_account.py
- Insert the into
profile::acme_chief::accounts
dict into hiera. It should look something like this:
profile::acme_chief::accounts:
{hash}:
directory: https://acme-v02.api.letsencrypt.org/directory
regr: '{"body": {}, "uri": "https://acme-v02.api.letsencrypt.org/acme/acct/{number}"}'
Getting the hash from the account ID above and the number from the regr.json above. Be careful not to include the .body.key part of the regr.json.
- Insert the regr.json and private_key.pem into the specified locations in cherry-picks on your puppetmaster.
- Add your cert to the certificate dict in hiera:
mycertificate:
CN: wikipedia.org
SNI:
- wikipedia.org
- '*.wikipedia.org'
authorized_regexes:
- ^cp-[0-9]+\.myproject\.eqiad1\.wikimedia\.cloud$
challenge: dns-01
- Set
acmechief_host: myproject-acme-chief-01.myproject.eqiad1.wikimedia.cloud
on a project-wide basis (or at least on the instances which will be pulling certs from it)
You should now be able to use the acme_chief::cert
resource on your TLS termination box(es) to get a certificate, with a name matching what you have in the hiera config.
Summary
- if the project doesn't already have one, set up a DNS management user in Keystone with
observer
anddesignateadmin
permissions. More info about that at Service accounts - make acme-chief node(s)
- add some hiera
- add designateadmin user's pass in a secret
- add the acme-chief role to your instance(s)
- do LE account creation, commit results as a secret
- start using it - configure a cert and point other instances in the project at your acme-chief instance
Troubleshooting
- If a cert has strangely expired, you may have hit a known issue in acme chief where it doesn't respond to HUP quite right. Restarting acme-chief should work. Otherwise, see Acme-chief#Monitoring
- If client hosts complain about 'unable to get local issuer certificate' you may need to restart nginx on the acme-chief host, or restart the puppetserver.