Maintenance scripts
This page documents the new setup for maintenance scripts on Kubernetes. The old system on the maintenance servers is still available as a fallback for now, but those servers will be going away.
If this setup doesn't work for you, please report issues on task T341553 promptly, so that we can ensure the new system meets your needs before that happens.
As of September 2024, maintenance scripts should no longer be run on the maintenance servers (mwmaint*
). Instead, they're launched as Kubernetes jobs on Wikikube, the same Kubernetes cluster (and using the same MediaWiki docker image) as our MediaWiki deployments serving production traffic.
Any time you would previously SSH to a mwmaint
host and run mwscript
to run a maintenance script, follow these steps instead.
Starting a maintenance script
This requires production access, particularly membership in the deployment group.
SSH to any deployment server. Either deployment server will work; your job will automatically start in whichever data center is active, so you don't need to change deployment hosts when there's a datacenter switchover. You may use a screen or tmux, but it's not required.
rzl@deploy2002:~$ mwscript-k8s --comment="T341553" -- Version.php --wiki=enwiki
Any options for the mwscript-k8s tool, as described below, go before the --
. After the --
, the first argument is the script name; everything else is passed to the script.
The --comment
flag sets an optional (but encouraged) descriptive label, such as a task number.
Kubernetes saves the maintenance script's output for seven days after completion.
Tailing stdout
By default, mwscript-k8s prints a kubectl command that you (or anyone else) can paste and run to monitor the output or save it to a file.
As a convenience, you can pass -f
(--follow
) to mwscript-k8s to immediately begin tailing the script output. If you like, you can do this inside a screen or tmux. Either way, you can safely disconnect and your script will continue running on Kubernetes.
rzl@deploy2002:~$ mwscript-k8s -f -- Version.php --wiki=testwiki
[...]
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)
Input on stdin
For scripts that take input on stdin, you can pass --attach
to mwscript-k8s, either interactively or in a pipeline.
rzl@deploy2002:~$ mwscript-k8s --attach -- shell.php --wiki=testwiki
[...]
Psy Shell v0.12.3 (PHP 7.4.33 — cli) by Justin Hileman
> $wmgRealm
= "production"
>
(Note: for shell.php
in particular, you can also use mw-debug-repl
instead.)
rzl@deploy2002:~$ cat example_url.txt | mwscript-k8s --attach -- purgeList.php
[...]
Purging 1 urls
Done!
Attaching to the process will attach to both its stdin and stdout; you don't need to pass --attach --follow
.
Input from a file
Because the script runs in a Docker container on a Kubernetes worker machine, it can't read files on the deployment host. When the script needs to read from a file, such as a list of URLs, you can pass --file
to mwscript-k8s to copy the file into the container.
Only text files are supported, and the maximum total size is 1 MiB. Files are always placed in /data inside the container; that's the maintenance script's working directory, so no path needs to be specified.
rzl@deploy2002:~$ ls
input.txt
rzl@deploy2002:~$ mwscript-k8s --file=input.txt -- ReadFromAFile.php --wiki=testwiki --filename=input.txt
You can pass --file
repeatedly to copy multiple files.
rzl@deploy2002:~$ mwscript-k8s --file=/srv/example/input1.txt --file=/srv/example/input2.txt -- ReadFromTwoFiles.php --wiki=testwiki --urls=input1.txt --more-urls=input2.txt
Optionally, you can specify a different filename to use inside the container, using a colon as below. (But don't specify a directory after the colon; /data is the only supported destination.)
rzl@deploy2002:~$ ls
input_with_a_long_filename.txt
rzl@deploy2002:~$ mwscript-k8s --file=input_with_a_long_filename.txt:input.txt -- ReadFromAFile.php --wiki=testwiki --filename=input.txt
Shelling out to mwscript-k8s
If invoking mwscript-k8s from software, rather than in an interactive session, use -o json
(--output=json
) for machine-readable information about the job. Human-readable output still appears on stderr, and can be suppressed.
rzl@deploy2002:~$ mwscript-k8s --comment="T341553" --output=json -- Version.php --wiki=enwiki 2>/dev/null
{
"error": null,
"mwscript": {
"cluster": "codfw",
"config": "/etc/kubernetes/mw-script-codfw.config",
"deploy_config": "/etc/kubernetes/mw-script-deploy-codfw.config",
"job": "mw-script.codfw.c60nd9x7",
"mediawiki_container": "mediawiki-c60nd9x7-app",
"namespace": "mw-script"
}
}
The error
and mwscript
keys will always be present, and exactly one of them will be non-null.
If there was a problem launching the job, mwscript-k8s will exit with nonzero status. error
will be a string containing a human-readable error message, and mwscript
will be null.
If the job launched successfully, mwscript-k8s will exit with status 0. error
will be null and mwscript
will contain everything you need to check on your job using the Kubernetes API (either programmatically or by shelling out to kubectl), formatted like the above example.
(This doesn't indicate the exit status of the maintenance script, which may still crash later on—or might even immediately fail to start, e.g. if its command-line flags are wrong. Successful termination of mwscript-k8s indicates only that the job was successfully submitted to the Kubernetes cluster.)
Note that mwscript.config
and mwscript.deploy_config
are paths to Kubernetes config files on the deployment host with different levels of privilege; use mwscript.config
whenever possible for read-only operations like checking job status, and mwscript.deploy_config
when necessary for mutating operations like terminating your job early.
Some fields in the output look similar; for example, it looks as though you could deduce the value of mwscript.cluster
by parsing mwscript.job
. Don't do this. Instead, treat each entry as an opaque string whose structure is an implementation detail. This will ensure your automation keeps working when the naming conventions change with future updates to the maintenance scripts' Helm chart and helmfile.
Because the extra output would interfere with JSON parsing, the flags --attach
, --follow
, and --verbose
are incompatible with --output=json
.
--attach
or --follow
, mwscript-k8s terminates (returning your JSON) immediately after launching the job, without waiting for the job to complete. If you invoke mwscript-k8s in a loop, you can launch many jobs in parallel, multiplying the impact on shared resources like the databases.Interacting with jobs
Use standard kubectl commands to check the status, and view the output, of running jobs. Some selected examples are below, but refer to the kubectl documentation for detailed usage.
Job names are automatically generated, of the form mw-script.codfw.1234wxyz
, with a random alphanumeric component at the end. mwscript-k8s prints the job name in its first line of output.
Scripts are always launched in the active data center (in these examples, codfw) so that cluster appears in the job name and should be passed to kube_env. Like mwscript-k8s, kubectl can be used from either deployment host.
Listing jobs
Use kubectl get job
. Optionally, use -l username=$USER
to filter the list to only jobs started by a particular user; this can make it easier to find your own.
rzl@deploy1003:~$ kube_env mw-script codfw
rzl@deploy1003:~$ kubectl get job -l username=rzl -L script
NAME COMPLETIONS DURATION AGE SCRIPT
mw-script.codfw.0aajirtz 1/1 5s 15m Version.php
Showing script output
Pass both the job name and container name to kubectl logs
. (Several containers run in each MediaWiki pod, but only one is the application container we're interested in.) The appropriate command is provided by mwscript-k8s, but you can reconstruct it; if you don't remember the name of the right container, omit it, and the error message will offer you several to choose from. The application container has a name ending in -app
.
rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz
error: a container name must be specified for pod mw-script.codfw.0aajirtz-r69bf, choose one of: [mediawiki-0aajirtz-app mediawiki-0aajirtz-tls-proxy mediawiki-0aajirtz-rsyslog]
rzl@deploy1003:~$ kubectl logs job/mw-script.codfw.0aajirtz mediawiki-0aajirtz-app
MediaWiki version: 1.43.0-wmf.24 LTS (built: 22:35, 23 September 2024)
In this example, the job is already completed. If it were still running, we could use kubectl logs -f
(analogous to tail -f
) to stream the output.
Finished jobs are saved for up to a week, including their logs, then cleaned up.
Terminating a job
Deleting a Kubernetes job sends a SIGTERM to the running script. You'll need to act as the deploy user to delete the job; use caution as this gives you elevated privileges over all maintenance scripts, not just your own.
This terminates the job, but also deletes it from the Kubernetes cluster, including deleting its saved logs. Capture those first, if you need to keep them.
rzl@deploy1003:~$ kube_env mw-script-deploy codfw # Act as the deploy user to get delete privileges; use caution
rzl@deploy1003:~$ kubectl delete job mw-script.codfw.0aajirtz
Not yet supported
For now, fall back to running mwscript
directly on the bare-metal maintenance servers if you need any of the following:
- Helpers that run a maintenance script on multiple wikis: mwscriptwikiset, foreachwiki, foreachwikiindblist. (Of course, it's fine to manually use mwscript-k8s multiple times to run a script on several wikis. Remember that by default, mwscript-k8s exits immediately without waiting for job completion; if you wrap it in a shell for-loop, the jobs will run in parallel.)
- Jobs that need to save persistent files to disk. On Kubernetes, your maintenance script runs in a Docker container which will not outlive it. Scripts should log their important output to stdout, or persist it in a database or other remote storage.
- The sql command (i.e., the mysql.php maintenance script). The mysql client is not installed in our production MediaWiki images. The replacement probably won't be a maintenance script, but a wrapper for mysql using dbconfig data. (task T375910)
If the job is interrupted (e.g. by hardware problems), Kubernetes can automatically move it to another machine and restart it, babysitting it until it completes. Because not all maintenance scripts were originally written to be safely restarted, mwscript-k8s jobs are not restarted automatically; if your job is interrupted, it will stay stopped unless you manually intervene.