Wikimedia Cloud Services team/EnhancementProposals/Toolforge jobs
This page contains information on a potential design to support grid-like jobs on Toolforge kubernetes, the end goal being to help in GridEngine deprecation.
Proposal
This proposal consist on introducing a framework called Toolforge Jobs Framework (or TJF). It is basically a new API to ease end user interaction with Toolforge jobs in the kubernetes cluster. The new API should abstract away most of the k8s gory details for configuring, removing, managing and reading status on jobs. The abstraction approach is similar to what is being done with Toolforge webservices (we have the webservice
command there), but with a new approach that consist on decoupling the software into 2 components: an API service and a command line interface.
The framework consists precisely in these two components.
The API is freely usable within Toolforge, both bastion servers and kubernetes pods. This means that a running job can interact with the Toolforge jobs API and CRUD other jobs.
There are no plans to introduce backwards support for GridEngine in TJF, given the ultimate goal is to deprecate the old grid.
The two components approach

The TJF is composed of 2 components:
toolforge-jobs-api
--- runs inside the k8s cluster as a webservice. Offers the REST API that in turn interacts with the k8s API native objects:CronJob
,Job
andReplicationController
.toolforge-jobs-cli
--- command line interface to interact with the toolforge-job-api service. Typically used by end users in Toolforge bastions.
By splitting the software into two components, and introducing an stable API, we aim to reduce maintenance burden by not needing to rebuild all Toolforge docker containers every time we change some internal mechanism (which is the case of the tools-webservice package).
Also, the new REST API can be used by any user inside Toolforge. We open the door to enable a simplified programmatic usage of this new kubernetes jobs feature, which can be a nice incentive for users to migrate away from the grid.
k8s abstraction that matches GridEngine experience
We would like to support a similar experience to what users are used to in GridEngine. Given the feature mapping table below, it should be possible in Kubernetes by using the following mechanisms:
Job
. This object is the basic definition of a workload in the k8s cluster that makes it run a given task and ensure it finished as expected.CronJob
. This object support cron-like scheduling of childJobs
objects.ReplicationController
. This object is used to ensure a givenJob
is present. Used to control execution of continuous tasks, a feature not supported natively in theJob
object.
Auth

To ensure that Toolforge users only manage their own jobs, TJF will use kubernetes certificates for client authentication. These x509 certificates are automatically managed by maintain-kubeusers
, and live in each user home directory:
toolsbeta.test@toolsbeta-sgebastion-04:~$ egrep client-certificate\|client-key .kube/config
client-certificate: /data/project/test/.toolskube/client.crt
client-key: /data/project/test/.toolskube/client.key
toolsbeta.test@toolsbeta-sgebastion-04:~$ head -1 /data/project/test/.toolskube/client.crt
-----BEGIN CERTIFICATE-----
toolsbeta.test@toolsbeta-sgebastion-04:~$ head -1 /data/project/test/.toolskube/client.key
-----BEGIN RSA PRIVATE KEY-----
In the current Toolforge webservice setup, TLS termination is done at the nginx front proxy. The front proxy talks to the backends using plain HTTP, with no simple options for relaying or forwarding the original client TLS certs. We would need to introduce modifications to the front proxy to accept client TLS certificates, so instead we decided to run a parallel ingress.
The toolforge-jobs-api
component needs to know the client certificate CommonName. With this information, toolforge-jobs-api
will be able to supplant the user by reading again the x509 certificates from the user home, and use them to interact with the kubernetes API. This is effectively a TLS proxy that reuses the original certificate.
This results in two types of connections, as shown in the diagram above:
- connection type 1: an user contacts
toolforge-jobs-api
using k8s client TLS certs from its home directory. The TLS connection is established to theingress-ngnx-jobs
, which has the client-side TLS termination. This can happen from a Toolforge bastion, or from a Job already running inside kubernetes. The connection can be made either usingtoolforge-jobs-cli
or directly contactingtoolforge-jobs-api
programmatically by other methods. - connection type 2: once the CommonName of the original request certificate is validated,
toolforge-jobs-api
can load the same k8s client TLS certificate from the user home, and supplant the user to contact the k8s API. For this to be possible, thetoolforge-jobs-api
component needs permissions for every user home directory, pretty much likemaintain-kubeusers
has.
This setup is possible because the x509 certificates are maintained by the maintain-kubeusers
component, and because toolforge-jobs-api
runs inside the kubernetes cluster itself and therefore can be configured with enough permissions to read each users home.
More or other authentication mechanisms can be introduced in the future as we detect new use cases.
The Toolforge front proxy exists today basically for webservices running in the grid. Once the grid is fully deprecated and we no longer need the front proxy, we could re-evaluate this whole situation and simplify it.
Not using the framework
Advanced Toolforge users that know how to interact with a Kubernetes API can still use it directly (like for webservices). Using the new TJF is optional and is provided just as a convenient facility for Toolforge users.
The containers problem
We have custom-built containers for Toolforge webservices. Containers for the most common web development frameworks and language runtimes. Each container don't include every and each language and framework in the universe for practical reasons.
However, users can currently schedule jobs in GridEngine using any language, library or framework installed in our Debian bastions. They can write a script that combines calls to Python, PHP and Perl. We would need to think and develop a container solution that enables job users with the appropriate runtimes.
For the few first iterations of this project it can suffice to make pywikibot
available. We can work later on to discover more useful runtimes.
Implementation details
TODO: Arturo would like to use Python3 to build TJF, using flask-restful https://flask-restful.readthedocs.io
Checking client TLS certs: https://www.ajg.id.au/2018/01/01/mutual-tls-with-python-flask-and-werkzeug/
dcaro: Might be interesting to use https://pypi.org/project/flask-swagger/ also (specially if the api is open)
hieu: check https://fastapi.tiangolo.com/
timeline
Proposed timeline for implementation, development and feature rollout.
- FY20/21 Q3: Design & proposal. Basic TJF source code bootstrap.
- FY20/21 Q4: A minimal TJF is developed. Select a few beta testers and early adopters.
- FY21/22 Q1: Announce to the community new framework availability, work with users to migrate to it.
- FY21/22 Q2: Closer to grid deprecation? <3
about logs
If left unattended, logs produced by jobs can easily hammer and bring down our etcd clusters We should come with a solution to strictly restrict logging, and redirect them to each user NFS home directory.
Some potential ideas on how to do that at kubernetes level: https://kubernetes.io/docs/concepts/cluster-administration/logging/
To be clear, this means that logs produced by jobs should not be made available using kubectl logs
because that means the stderr/stdout of the pod is being RW in the etcd cluster.
URL
Relevant URLs in the Toolforge project:
- https://jobs.toolforge.org/ --- main page, with brief explanation and links to documentation
- https://jobs.toolforge.org/api/ --- API endpoints
Relevant URLs in the Toolsbeta project:
- TBD
Feature mapping
Each currently supported use case in the grid should have an equivalent feature in kubernetes. The table below should help map each one.
Additionally, the table shows how each feature would map to the TJF.
Feature | GridEngine | Kubernetes | toolforge-jobs-cli | toolforge-jobs-api |
---|---|---|---|---|
simple one-off job launch | jsub |
native Job API support | toolforge-jobs run <cmd> --type <container> |
POST /api/v1/run/ |
get single job status | qstat |
kubectl describe job |
toolforge-jobs show <id> |
GET /api/v1/show/{id}/ |
get all jobs status | qstat |
kubectl + some scripting |
toolforge-jobs list |
GET /api/v1/list/ |
delete job | jstop |
kubectl delete |
toolforge-jobs delete <id> |
DELETE /api/v1/delete/{id}/ |
delete all jobs | some scripting | kubectl delete |
toolforge-jobs flush |
DELETE /api/v1/flush/ |
scheduled jobs | crontab + jsub |
native CronJob API support | toolforge-jobs run <cmd> --type <container> --schedule <sched> |
POST /api/v1/run/ |
continuous job launch (bot, daemon) | jstart |
native ReplicationController API support | toolforge-jobs run <cmd> --type <container> --continuous |
POST /api/v1/run/ |
concurrency limits | 16 running + 34 scheduled | TBD. several potential mechanisms | TBD | TBD |
get stderr / stdout of a job | files in the NFS directory | files in the NFS directory | No initial support | No initial API support |
request additional mem | jsub -mem |
TBD. we may not need this | TBD | TBD |
sync run | jsub -sync y |
TBD. no native support | toolforge-jobs run <cmd> --type <container> --wait |
POST /api/v1/run/ + GET /api/v1/show/{id}/ |
making sure a job only runs once | jsub -once |
native Job API support | toolforge-jobs run <cmd> --type <container> |
POST /api/v1/run/ |
listing available containers | No support / not required | Similar to what we do on tools-webservices | toolforge-jobs containers |
GET /api/v1/containers/ |
API docs
This section contains concrete details for the API that TJF introduces.
POST /api/v1/run/
Creates a new job in the kubernetes cluster.
POST /api/v1/run/ details | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Example request data: {
"name": "myjob",
"cmd": "./myscript.py --once",
"type": "toolforge-buster-sssd",
"schedule": "*/1 * * * *"
}
|
GET /api/v1/show/{name}/
Shows information about a job in the kubernetes cluster.
GET /api/v1/show/{name}/ details | |||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Example response JSON data: {
"name": "myjob",
"cmd": "./myscript.py --once",
"type": "python",
"continuous": false,
"state"; "finished"
}
|
DELETE /api/v1/delete/{name}
Delete a job in the kubernetes cluster.
DELETE /api/v1/delete/{name} details | ||||||||
---|---|---|---|---|---|---|---|---|
|
GET /api/v1/list/
Shows information about all user jobs in the kubernetes cluster.
GET /api/v1/list/ details | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Example response JSON data: [
{
"name": "myjob",
"cmd": "./myscript.py --once",
"type": "toolforge-buster-sssd",
"continuous": false,
"state": "finished"
},
{
"name": "myotherjob",
"cmd": "./myotherscript.py --once",
"type": "toolforge-buster-sssd",
"schedule": "*/1 * * * *",
"state": "running"
}
]
|
DELETE /api/v1/flush/
Delete all user jobs in the kubernetes cluster.
DELETE /api/v1/flush/ details |
---|
There are no request parameters. |
GET /api/v1/containers/
Shows information about all containers available for jobs in the kubernetes cluster.
GET /api/v1/containers/ details | |||||||||
---|---|---|---|---|---|---|---|---|---|
Example response JSON data: [
{
"name": "tf-buster",
"type": "docker-registry.tools.wmflabs.org/toolforge-buster-sssd:latest"
},
{
"name": "tf-buster-std",
"type": "docker-registry.tools.wmflabs.org/toolforge-buster-standalone:latest"
}
]
|
Ingress & TLS

There are 2 nginx-ingress deployments in parallel in the k8s cluster:
- the general one for all toolforge tools, in the
ingress-nginx
namespace, untouched by this project - the jobs-specific one, in the
ingress-nginx-jobs
namespace.
The jobs-specific one is able to read TLS client certificates and pass the ssl-client-subject-dn
HTTP header to the pod running the toolforge-jobs-api
webservice.
With this information toolforge-jobs-api
can load again the client cert when talking to the k8s API on behalf of the original user.
The way this whole ingress /TLS setup works is as follows:
- To reach the
ingress-nginx-jobs
ingress, there is a FQDNjobs.svc.toolsbeta.eqiad1.wikimedia.cloud
that points to the k8s haproxy VIP address. - The haproxy system listens on 30001/TCP for this jobs-specific ingress (and in 30000/TCP for the general one).
- The haproxy daemon reaches k8s ingress nodes on 30001/TCP (the
ingress-nginx-jobs-svc
so traffic both internal and external to the cluster can reach the nginx proxy. - The
ingress-nginx-jobs
is configured to only load 1Ingress
object, which is the one defined for thetoolforge-jobs-api
. - The
Ingress
object instructsingress-nginx-jobs
to enable client TLS by using the annotationnginx.ingress.kubernetes.io/auth-tls-verify-client: on
andnginx.ingress.kubernetes.io/auth-tls-pass-certificate-to-upstream: "true"
. - Client TLS certs are verified against the kubernetes CA, which should be configured in the
nginx.ingress.kubernetes.io/auth-tls-secret: "default/ca-secret"
annotation of theIngress
object. - Once the TLS certs are verified the proxy injects the HTTP header
ssl-client-subject-dn
totoolforge-jobs-api
, which contains theCN=
information of the original user. - With the
ssl-client-subject-dn
header,toolforge-jobs-api
can load again the client certificate from the original user home on NFS and in turn contact the k8s API using them.
In order for the 2 nginx-ingress deployments on the cluster to ignore each other Ingress
objects, we need a few additional bits.
in kubernetes 1.17
Per upstream docs we need to pay attention to the --ingress-class
nginx command line flag.
- the general ingress, using the default value (unset), which means it will process every ingress with the
kubernetes.io/ingress.class: "nginx"
annotation or with no annotation at all. - the jobs ingress, using
--ingress-class=jobs
, which will handle every ingress with thekubernetes.io/ingress.class: "jobs"
annotation (there should be just one).
in kubernetes 1.18
Per the release notes, the kubernetes.io/ingress.class
annotation is deprecated, and the IngressClass
resource should be used instead. The Ingress
object should use the new ingressClassName
field.
Development
Some random stuff that Arturo has written here.
notes on k8s objects
Example workflow using the k8s API with a Job object |
---|
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl delete job arturo-test-job ; kubectl apply -f job.yaml
job.batch "arturo-test-job" deleted
job.batch/arturo-test-job created
tools.arturo-test-tool@tools-sgebastion-08:~$ cat job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: arturo-test-job
spec:
backoffLimit: 3
template:
spec:
containers:
- name: arturo-test
image: docker-registry.tools.wmflabs.org/toolforge-buster-sssd:latest
workingDir: /data/project/arturo-test-tool/
command:
- ./arturo-test-script.sh
env:
- name: HOME
value: /data/project/arturo-test-tool
volumeMounts:
- mountPath: /data/project
name: home
restartPolicy: Never
volumes:
- hostPath:
path: /data/project
type: Directory
name: home
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
arturo-test-job 1/1 9s 88s
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
arturo-test-job-9lzkc 0/1 Completed 0 94s
tools.arturo-test-tool@tools-sgebastion-08:~$ kubectl logs job/arturo-test-job
arturo test script
Done sleeping
|
Example workflow using the k8s API with a CronJob object |
---|
toolsbeta.test@toolsbeta-sgebastion-04:~$ cat cronjob.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-cronjob
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: test-job
image: docker-registry.tools.wmflabs.org/toolforge-buster-sssd:latest
workingDir: /data/project/test/
command:
- ./test-script.sh
env:
- name: HOME
value: /data/project/test
volumeMounts:
- mountPath: /data/project
name: home
restartPolicy: Never
volumes:
- hostPath:
path: /data/project
type: Directory
name: home
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl apply -f cronjob.yaml
cronjob.batch/test-cronjob created
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get cronjob
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
test-cronjob */1 * * * * False 0 <none> 5s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl describe cronjob test-cronjob
Name: test-cronjob
Namespace: tool-test
[..]
Schedule: */1 * * * *
[..]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 4m29s cronjob-controller Created job test-cronjob-1611663900
Normal SawCompletedJob 4m19s cronjob-controller Saw completed job: test-cronjob-1611663900, status: Complete
Normal SuccessfulCreate 3m29s cronjob-controller Created job test-cronjob-1611663960
Normal SawCompletedJob 3m19s cronjob-controller Saw completed job: test-cronjob-1611663960, status: Complete
Normal SuccessfulCreate 2m29s cronjob-controller Created job test-cronjob-1611664020
Normal SawCompletedJob 2m19s cronjob-controller Saw completed job: test-cronjob-1611664020, status: Complete
Normal SuccessfulCreate 89s cronjob-controller Created job test-cronjob-1611664080
Normal SawCompletedJob 78s cronjob-controller Saw completed job: test-cronjob-1611664080, status: Complete
Normal SuccessfulDelete 78s cronjob-controller Deleted job test-cronjob-1611663900
Normal SuccessfulCreate 38s cronjob-controller Created job test-cronjob-1611664140
Normal SawCompletedJob 28s cronjob-controller Saw completed job: test-cronjob-1611664140, status: Complete
Normal SuccessfulDelete 28s cronjob-controller Deleted job test-cronjob-1611663960
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get jobs
NAME COMPLETIONS DURATION AGE
test-cronjob-1611664020 1/1 2s 2m33s
test-cronjob-1611664080 1/1 3s 93s
test-cronjob-1611664140 1/1 2s 42s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl delete cronjob/test-cronjob
cronjob.batch "test-cronjob" deleted
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get jobs
No resources found in tool-test namespace.
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get cronjobs
No resources found in tool-test namespace.
|
Example workflow using the k8s API with a ReplicationController object |
---|
toolsbeta.test@toolsbeta-sgebastion-04:~$ cat replicationcontroller.yaml
apiVersion: v1
kind: ReplicationController
metadata:
name: test-continous-job
spec:
replicas: 1
selector:
app: test-job
template:
metadata:
name: test-job
labels:
app: test-job
spec:
containers:
- name: test-job
image: docker-registry.tools.wmflabs.org/toolforge-buster-sssd:latest
workingDir: /data/project/test/
command:
- ./test-script-whiletrue.sh
env:
- name: HOME
value: /data/project/test
volumeMounts:
- mountPath: /data/project
name: home
restartPolicy: Always
volumes:
- hostPath:
path: /data/project
type: Directory
name: home
toolsbeta.test@toolsbeta-sgebastion-04:~$ cat test-script-whiletrue.sh
#!/bin/bash
while true ; do
echo "INFO: running test script"
date
sleep 5
done
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl apply -f replicationcontroller.yaml
replicationcontroller/test-continous-job created
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get replicationcontroller
NAME DESIRED CURRENT READY AGE
test-continous-job 1 1 1 12s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-continous-job-m5sxh 1/1 Running 0 17s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl logs test-continous-job-m5sxh
INFO: running test script
Tue 26 Jan 2021 12:38:02 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:07 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:12 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:17 PM UTC
INFO: running test script
Tue 26 Jan 2021 12:38:22 PM UTC
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl delete pod test-continous-job-m5sxh
pod "test-continous-job-m5sxh" deleted
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-continous-job-95n98 1/1 Running 0 64s
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl describe replicationcontroller/test-continous-job
Name: test-continous-job
Namespace: tool-test
[..]
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 111s replication-controller Created pod: test-continous-job-m5sxh
Normal SuccessfulCreate 77s replication-controller Created pod: test-continous-job-95n98
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl delete replicationcontroller test-continous-job
replicationcontroller "test-continous-job" deleted
toolsbeta.test@toolsbeta-sgebastion-04:~$ kubectl get pods
NAME READY STATUS RESTARTS AGE
test-continous-job-95n98 1/1 Terminating 0 3m40s
|
development environment
The development environment is somewhat non trivial to set up. Given that TJF operates in a way similar to maintain-kubeusers
, you will need a local kubernetes clusters (using minikube) to be able to emulate the Toolforge environment.
TODO: add more information here.
source code
Gerrit repositories:
- https://gerrit.wikimedia.org/r/admin/repos/cloud/toolforge/jobs-framework-api
- https://gerrit.wikimedia.org/r/admin/repos/cloud/toolforge/jobs-framework-cli
See also
Internal documents:
Some upstream kubernetes documentation pointers:
- https://kubernetes.io/docs/concepts/workloads/controllers/job/
- https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/
- https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
- https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
- https://kubernetes.io/docs/tasks/job/
- https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
- https://kubernetes.io/docs/concepts/workloads/controllers/job/#ttl-mechanism-for-finished-jobs
Related components: