Jump to content

Deployments/Emergencies

From Wikitech
If you're looking for help with an emergency situation, please first try to contact Release Engineering & SRE on libera.chat in #wikimedia-operations connect. If that fails, it may be appropriate to use Klaxon.

Emergency deployments happen when things need fixing right now, even though deployments aren't happening right now.

How to

🚨 Step-by-step – to do an emergency release you must:

  • Join #wikimedia-operations connect on libera.chat
  • Get positive confirmation from SRE before deployment, and inform Release Engineering that you need to deploy (see the template below)
  • Have someone able to deploy your change

Ways to find a deployer:

IRC message Template

I need an emergency deploy for https://gerrit.wikimedia.org/r/1234 -- context is T1234, are SRE ok with a deployment? (cc: thcipriani [INSERT WEEKLY TRAIN CONDUCTOR NAME]). I (already have|need) someone to deploy.

Reasons for an emergency deploy

  • Address security issues
    For example, a mis-configuration once meant that a private wiki and all of its content was accidentally made public.
  • Avoid data loss / corruption
    For example, a coding error meant that newly-painted pages were being cached in a corrupted form; the longer it went, the more of the site was wrong.
  • Maintain availability
    For example, a new feature proved much more popular than planned and the extra load it was causing was threatening to take down the site, so it was temporarily disabled over a holiday, until people were back at work.
  • Prevent abuse
    For example, a massive content scraping run from a search engine wasn't responding to automated HTTP 429 speed bumps and so had to be manually blocked until they could adjust their code.
  • Major loss of functionality / appearance
    For example, a code efficiency change broke the visual appearance and usability of parts the sites for a large number of logged-out users, and so the change was reverted out of production until it could be fixed.

For deployers

  • Rollback first, fix later; maintaining an overall service to our users is the most important focus.
  • Prioritise general availability over that of new features; we have a billion readers and only a few users of your new tool, no matter how cool.
  • Make on-wiki edits rarely, and only when you really have to; each wiki's editing community expects autonomy.