Jump to content

Periodic jobs

From Wikitech
(Redirected from Crontab)
This page is currently a draft.
Material may not yet be complete, information may presently be omitted, and certain parts of the content may be subject to radical, rapid alteration. More information pertaining to this may be available on the talk page.

This page documents the new Kubernetes setup for periodic MediaWiki Maintenance scripts. The old system on the Maintenance servers is still available as a fallback for now, but those servers will be going away.

We will be migrating Maintenance scripts promptly at task T341555, with subtasks for groups of related cronjobs. If you discover issues, please check the subtasks to see if your cronjob was migrated, and comment there if so.

SRE is targetting March 2025 to complete migration of periodic execution of MediaWiki maintenance scripts from the Maintenance server to the mw-cron deployment of MediaWiki On Kubernetes in the WikiKube.

Logs

Logstash Dashboard

Monitoring

TODO

Troubleshooting

If your job has failed, you can either look at logstash, or diagnose from the command line on the deployment server.

A phabricator task should have been opened for your team containing the correct team and cronjob selectors to use in the code below.

The code below assumes eqiad is the current primary datacenter.

cgoubert@deploy1003:~$ kube-env mw-cron eqiad
cgoubert@deploy1003:~$ kubectl get jobs -l 'team=sre-serviceops, cronjob=serviceops-version' -A --field-selector status.successful=0
NAMESPACE   NAME                                         COMPLETIONS   DURATION   AGE
mw-cron     mediawiki-main-serviceops-version-29050030   0/1           9s         28m
cgoubert@deploy1003:~$ POD=$(kubectl describe job mediawiki-main-serviceops-version-29050020  | grep 'Created pod' | cut -d: -f2)
cgoubert@deploy1003:~$ kubectl logs $POD mediawiki-main-app
Logs displayed here

Job Migration

General procedure

Code changes

The jobs are still defined in puppet, using the profile::mediawiki::periodic_job resource.

Additional parameters are necessary to migrate a job to mw-cron and remove it from the maintenance servers.

  • If the periodic jobs are defined in a subprofile of profile:mediawiki::maintenance, change the class definition to include the $helmfile_defaults_dir parameter
class profile::mediawiki::maintenance::subprofile(
    Stdlib::Unixpath $helmfile_defaults_dir = lookup('profile::kubernetes::deployment_server::global_config::general_dir', {default_value => '/etc/helmfile-defaults'}),
) {
  • Include your subprofile in profile::kubernetes::deployment_server::mediawiki::periodic_jobs
  • Add the following additional parameters:
      cron_schedule         => '*/10 * * * *', # The interval must be converted from systemd-calendar intervals to crontab syntax. Keep the interval parameter as well if your job is used on beta
      kubernetes            => true, # Create the CronJob resource in mw-cron, and remove the systemd-timer from the maintenance server
      team                  => 'job-owner-team', # For easier monitoring, log dashboard, and alerting
      script_label          => 'scriptName-wikiName', # A label for monitoring, logging, and alerting, preferably the script name and its target
      description           => 'A longer form description of the periodic job',
      helmfile_defaults_dir => $helmfile_defaults_dir, # Pass down the directory where the jobs will be defined

If you are not migrating all jobs in the subprofile at once, you will notice a lot of resources being created on the kubernetes deployment server. This is normal and an artifact of how we define the jobs. They will be no-op resources until the above additional parameters have been defined - if in doubt, ask in #wikimedia-serviceops. The example below makes the actual diff easier to read.

Example

We recommend the following procedure:

  • Create a first change adding the subprofile to the profile::kubernetes::deployment_server::mediawiki::periodic_jobs profile, like 1117234. This will have all the no-op changes.
  • Migrate the jobs in follow-up patches like 1117862

Deployment

  1. Disable puppet on the maintenance server if the job can't be interrupted easily
  2. Merge the puppet change
  3. Run puppet on the deployment server, this will create the job definition, but it won't be deployed to mw-cron yet
  4. Stop the job on the maintenance server if it is running
  5. Deploy the mw-cron change with helmfile on the deployment server. The job will start on its next scheduled trigger.
  6. Enable puppet on the maintenance server and run it. This will delete the systemd timer for the maintenance job.