Jump to content

Maintenance scripts

From Wikitech

This page documents the new setup for maintenance scripts on Kubernetes. The old system on the maintenance servers is still available as a fallback for now, but those servers will be going away.

If this setup doesn't work for you, please report issues on task T341553 promptly, so that we can ensure the new system meets your needs before that happens.

As of September 2024, maintenance scripts should no longer be run on the maintenance servers (mwmaint*). Instead, they're launched as Kubernetes jobs on Wikikube, the same Kubernetes cluster (and using the same MediaWiki docker image) as our MediaWiki deployments serving production traffic.

Any time you would previously SSH to a mwmaint host and run mwscript to run a maintenance script, follow these steps instead.

Starting a maintenance script

This requires production access, particularly membership in the deployment group.

SSH to any deployment server. Either deployment server will work; your job will automatically start in whichever data center is active, so you don't need to change deployment hosts when there's a datacenter switchover. You may use a screen or tmux, but it's not required.

rzl@deploy2002:~$ mwscript-k8s --comment="T341553" -- Version.php --wiki=enwiki

Any options for the mwscript-k8s tool, as described below, go before the --. After the --, the first argument is the script name; everything else is passed to the script.

The --comment flag sets an optional (but encouraged) descriptive label, such as a task number.

Kubernetes saves the maintenance script's output for seven days after completion.

Tailing stdout

By default, mwscript-k8s prints a kubectl command that you (or anyone else) can paste and run to monitor the output or save it to a file.

As a convenience, you can pass -f (--follow) to mwscript-k8s to immediately begin tailing the script output. If you like, you can do this inside a screen or tmux. Either way, you can safely disconnect and your script will continue running on Kubernetes.

rzl@deploy2002:~$ mwscript-k8s -f -- Version.php --wiki=testwiki
[...]
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)

Input on stdin

For scripts that take input on stdin, you can pass --attach to mwscript-k8s, either interactively or in a pipeline.

rzl@deploy2002:~$ mwscript-k8s --attach -- shell.php --wiki=testwiki
[...]
Psy Shell v0.12.3 (PHP 7.4.33 — cli) by Justin Hileman
> $wmgRealm
= "production"
>

(Note: for shell.php in particular, you can also use mw-debug-repl instead.)

rzl@deploy2002:~$ cat example_url.txt | mwscript-k8s --attach -- purgeList.php
[...]
Purging 1 urls
Done!

Attaching to the process will attach to both its stdin and stdout; you don't need to pass --attach --follow.

Input from a file

Because the script runs in a Docker container on a Kubernetes worker machine, it can't read files on the deployment host. When the script needs to read from a file, such as a list of URLs, you can pass --file to mwscript-k8s to copy the file into the container.

Only text files are supported, and the maximum total size is 1 MiB. Files are always placed in /data inside the container; that's the maintenance script's working directory, so no path needs to be specified.

rzl@deploy2002:~$ ls
input.txt
rzl@deploy2002:~$ mwscript-k8s --file=input.txt -- ReadFromAFile.php --wiki=testwiki --filename=input.txt

You can pass --file repeatedly to copy multiple files.

rzl@deploy2002:~$ mwscript-k8s --file=/srv/example/input1.txt --file=/srv/example/input2.txt -- ReadFromTwoFiles.php --wiki=testwiki --urls=input1.txt --more-urls=input2.txt

Optionally, you can specify a different filename to use inside the container, using a colon as below. (But don't specify a directory after the colon; /data is the only supported destination.)

rzl@deploy2002:~$ ls
input_with_a_long_filename.txt
rzl@deploy2002:~$ mwscript-k8s --file=input_with_a_long_filename.txt:input.txt -- ReadFromAFile.php --wiki=testwiki --filename=input.txt

Shelling out to mwscript-k8s

If invoking mwscript-k8s from software, rather than in an interactive session, use -o json (--output=json) for machine-readable information about the job. Human-readable output still appears on stderr, and can be suppressed.

rzl@deploy2002:~$ mwscript-k8s --comment="T341553" --output=json -- Version.php --wiki=enwiki 2>/dev/null
{
    "error": null,
    "mwscript": {
        "cluster": "codfw",
        "config": "/etc/kubernetes/mw-script-codfw.config",
        "deploy_config": "/etc/kubernetes/mw-script-deploy-codfw.config",
        "job": "mw-script.codfw.c60nd9x7",
        "mediawiki_container": "mediawiki-c60nd9x7-app",
        "namespace": "mw-script"
    }
}

The error and mwscript keys will always be present, and exactly one of them will be non-null.

If there was a problem launching the job, mwscript-k8s will exit with nonzero status. error will be a string containing a human-readable error message, and mwscript will be null.

If the job launched successfully, mwscript-k8s will exit with status 0. error will be null and mwscript will contain everything you need to check on your job using the Kubernetes API (either programmatically or by shelling out to kubectl), formatted like the above example.

(This doesn't indicate the exit status of the maintenance script, which may still crash later on—or might even immediately fail to start, e.g. if its command-line flags are wrong. Successful termination of mwscript-k8s indicates only that the job was successfully submitted to the Kubernetes cluster.)

Note that mwscript.config and mwscript.deploy_config are paths to Kubernetes config files on the deployment host with different levels of privilege; use mwscript.config whenever possible for read-only operations like checking job status, and mwscript.deploy_config when necessary for mutating operations like terminating your job early.

Some fields in the output look similar; for example, it looks as though you could deduce the value of mwscript.cluster by parsing mwscript.job. Don't do this. Instead, treat each entry as an opaque string whose structure is an implementation detail. This will ensure your automation keeps working when the naming conventions change with future updates to the maintenance scripts' Helm chart and helmfile.

Because the extra output would interfere with JSON parsing, the flags --attach, --follow, and --verbose are incompatible with --output=json.

Without --attach or --follow, mwscript-k8s terminates (returning your JSON) immediately after launching the job, without waiting for the job to complete. If you invoke mwscript-k8s in a loop, you can launch many jobs in parallel, multiplying the impact on shared resources like the databases.
Until task T376795 is resolved, do not launch many jobs in a tight loop, even if they run one-at-a-time. Currently, they create duplicate resources which linger after the job finishes, which can cause dangerous load on the Kubernetes infrastructure. Automatically launching a modest number of jobs (dozens a week, not thousands) is fine.

Interacting with jobs

Use standard kubectl commands to check the status, and view the output, of running jobs. Some selected examples are below, but refer to the kubectl documentation for detailed usage.

Job names are automatically generated, of the form mw-script.codfw.1234wxyz, with a random alphanumeric component at the end. mwscript-k8s prints the job name in its first line of output.

Scripts are always launched in the active data center (in these examples, codfw) so that cluster appears in the job name and should be passed to kube_env. Like mwscript-k8s, kubectl can be used from either deployment host.

Listing jobs

Use kubectl get job. Optionally, use -l username=$USER to filter the list to only jobs started by a particular user; this can make it easier to find your own.

rzl@deploy1003:~$ kube_env mw-script codfw
rzl@deploy1003:~$ kubectl get job -l username=rzl -L script
NAME                       COMPLETIONS   DURATION   AGE   SCRIPT
mw-script.codfw.0aajirtz   1/1           5s         15m   Version.php

Showing script output

Pass both the job name and container name to kubectl logs. (Several containers run in each MediaWiki pod, but only one is the application container we're interested in.) The appropriate command is provided by mwscript-k8s, but you can reconstruct it; if you don't remember the name of the right container, omit it, and the error message will offer you several to choose from. The application container has a name ending in -app.

rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz
error: a container name must be specified for pod mw-script.codfw.0aajirtz-r69bf, choose one of: [mediawiki-0aajirtz-app mediawiki-0aajirtz-tls-proxy mediawiki-0aajirtz-rsyslog]

rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz mediawiki-0aajirtz-app
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)

In this example, the job is already completed. If it were still running, we could use kubectl logs -f (analogous to tail -f) to stream the output.

Finished jobs are saved for up to a week, including their logs, then cleaned up.

Terminating a job

Deleting a Kubernetes job sends a SIGTERM to the running script. You'll need to act as the deploy user to delete the job; use caution as this gives you elevated privileges over all maintenance scripts, not just your own.

This terminates the job, but also deletes it from the Kubernetes cluster, including deleting its saved logs. Capture those first, if you need to keep them.

rzl@deploy1003:~$ kube_env mw-script-deploy codfw  # Act as the deploy user to get delete privileges; use caution
rzl@deploy1003:~$ kubectl delete job mw-script.codfw.0aajirtz

Not yet supported

For now, fall back to running mwscript directly on the bare-metal maintenance servers if you need any of the following:

  • Helpers that run a maintenance script on multiple wikis: mwscriptwikiset, foreachwiki, foreachwikiindblist. (Of course, it's fine to manually use mwscript-k8s multiple times to run a script on several wikis. Remember that by default, mwscript-k8s exits immediately without waiting for job completion; if you wrap it in a shell for-loop, the jobs will run in parallel.)
  • Jobs that need to save persistent files to disk. On Kubernetes, your maintenance script runs in a Docker container which will not outlive it. Scripts should log their important output to stdout, or persist it in a database or other remote storage.
  • The sql command (i.e., the mysql.php maintenance script). The mysql client is not installed in our production MediaWiki images. The replacement probably won't be a maintenance script, but a wrapper for mysql using dbconfig data. (task T375910)

If the job is interrupted (e.g. by hardware problems), Kubernetes can automatically move it to another machine and restart it, babysitting it until it completes. Because not all maintenance scripts were originally written to be safely restarted, mwscript-k8s jobs are not restarted automatically; if your job is interrupted, it will stay stopped unless you manually intervene.