Periodic jobs
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.
This page documents the new Kubernetes setup for periodic MediaWiki Maintenance scripts. The old system on the Maintenance servers is still available as a fallback for now, but those servers will be going away.
We will be migrating Maintenance scripts promptly at task T341555, with subtasks for groups of related cronjobs. If you discover issues, please check the subtasks to see if your cronjob was migrated, and comment there if so.SRE is targetting March 2025 to complete migration of periodic execution of MediaWiki maintenance scripts from the Maintenance server to the mw-cron
deployment of MediaWiki On Kubernetes in the WikiKube.
Logs
Monitoring
TODO
Troubleshooting
If your job has failed, you can either look at logstash, or diagnose from the command line on the deployment server.
A phabricator task should have been opened for your team containing the correct team
and cronjob
selectors to use in the code below.
The code below assumes eqiad is the current primary datacenter.
cgoubert@deploy1003:~$ kube-env mw-cron eqiad
cgoubert@deploy1003:~$ kubectl get jobs -l 'team=sre-serviceops, cronjob=serviceops-version' -A --field-selector status.successful=0
NAMESPACE NAME COMPLETIONS DURATION AGE
mw-cron mediawiki-main-serviceops-version-29050030 0/1 9s 28m
cgoubert@deploy1003:~$ POD=$(kubectl describe job mediawiki-main-serviceops-version-29050020 | grep 'Created pod' | cut -d: -f2)
cgoubert@deploy1003:~$ kubectl logs $POD mediawiki-main-app
Logs displayed here
Job Migration
General procedure
Code changes
The jobs are still defined in puppet, using the profile::mediawiki::periodic_job
resource.
Additional parameters are necessary to migrate a job to mw-cron
and remove it from the maintenance servers.
- If the periodic jobs are defined in a subprofile of
profile:mediawiki::maintenance
, change the class definition to include the$helmfile_defaults_dir
parameter
class profile::mediawiki::maintenance::subprofile(
Stdlib::Unixpath $helmfile_defaults_dir = lookup('profile::kubernetes::deployment_server::global_config::general_dir', {default_value => '/etc/helmfile-defaults'}),
) {
- Include your subprofile in
profile::kubernetes::deployment_server::mediawiki::periodic_jobs
- Add the following additional parameters:
cron_schedule => '*/10 * * * *', # The interval must be converted from systemd-calendar intervals to crontab syntax. Keep the interval parameter as well if your job is used on beta
kubernetes => true, # Create the CronJob resource in mw-cron, and remove the systemd-timer from the maintenance server
team => 'job-owner-team', # For easier monitoring, log dashboard, and alerting
script_label => 'scriptName-wikiName', # A label for monitoring, logging, and alerting, preferably the script name and its target
description => 'A longer form description of the periodic job',
helmfile_defaults_dir => $helmfile_defaults_dir, # Pass down the directory where the jobs will be defined
If you are not migrating all jobs in the subprofile at once, you will notice a lot of resources being created on the kubernetes deployment server. This is normal and an artifact of how we define the jobs. They will be no-op resources until the above additional parameters have been defined - if in doubt, ask in #wikimedia-serviceops. The example below makes the actual diff easier to read.
Example
We recommend the following procedure:
- Create a first change adding the subprofile to the
profile::kubernetes::deployment_server::mediawiki::periodic_jobs
profile, like 1117234. This will have all the no-op changes. - Migrate the jobs in follow-up patches like 1117862
Deployment
- Disable puppet on the maintenance server if the job can't be interrupted easily
- Merge the puppet change
- Run puppet on the deployment server, this will create the job definition, but it won't be deployed to
mw-cron
yet - Stop the job on the maintenance server if it is running
- Deploy the
mw-cron
change with helmfile on the deployment server. The job will start on its next scheduled trigger. - Enable puppet on the maintenance server and run it. This will delete the systemd timer for the maintenance job.