Tool:Deputy
![]() | |
---|---|
Website | https://deputy.toolforge.org |
Description | Bulk data processor for Deputy users |
Keywords | copyright, data processing, api, javascript, nodejs, typescript |
Author(s) | Chlod Alejandrotalk |
Maintainer(s) | Chlod (View all) |
Source code | https://github.com/ChlodAlejandro/deputy-dispatch |
License | Apache License 2.0 |
Issues | https://github.com/ChlodAlejandro/deputy-dispatch/issues |
Dispatch (or Deputy Dispatch) is a Node.js + Express webserver that exposes API endpoints that processes large masses of data from Wikimedia wikis for easier consumption by Deputy. It is meant to centralize and optimize the gathering and processing of bulk data such that numerous users of Deputy do not individual make taxing requests on Wikimedia servers.
This user makes requests under the user, but does not make any edits. It purely reads data from the Wikimedia servers, and the logged-in status allows it to query more than an anonymous user would be able to.
Usage
Dispatch is primarily used through Deputy. Deputy has been built to work cross-wiki and integrate with Dispatch to support every single Wikimedia wiki, with an out-of-box configuration which can handle simple copyright management tasks on the wiki.
The Dispatch API can also be used directly. Documentation for the API is automatically generated, and can be found here.
Asynchronous jobs
Some tasks done by Dispatch may require longer periods of time to run. Though these usually last under 3 minutes, timeouts or network issues may not be able to sustain such a connection for a prolonged period of time. For this reason, tasks which take a while to execute must be ran through asynchronous job requests. An initial request is sent to Dispatch (using POST
) which returns a job ID. The progress of the job can then be polled using a GET to the /{id}/progress
sub-path of that endpoint. Lastly, the result of that job when it completes can be accessed with a GET to the /{id}
sub-path of that endpoint.
Note that attempting to access the result early will end up in a 409 Conflict HTTP error. The data is usually cached for an hour before being discarded. Refer to the documentation for the task information schema.
Deployment
The deputy
tool uses a standard Node.js web service to operate. As of February 19, 2024, this tool is being deployed using the Toolforge Build Service.
Deployments are not automatic. As the Wikimedia GitLab instance develops, this may change in the future. For now, the following steps are used to deploy new versions of the tool.
- [me@tools-sgebastion-XX]
become deputy
- That's pretty obvious already.
- [tools.deputy@tools-sgebastion-XX]
toolforge build start https://github.com/ChlodAlejandro/deputy-dispatch
- Trigger a build on the Toolforge Build Service. This downloads the latest version of the repository (on
main
) and performs all necessary build steps.
- Trigger a build on the Toolforge Build Service. This downloads the latest version of the repository (on
- [tools.deputy@tools-sgebastion-XX]
toolforge webservice restart
- Restart the webservice. In the event that the
service.manifest
got deleted or the service must be restarted from scratch, use the following command:- [tools.deputy@tools-sgebastion-XX]
toolforge webservice --backend=kubernetes buildservice start
- [tools.deputy@tools-sgebastion-XX]
- Restart the webservice. In the event that the
- [tools.deputy@tools-sgebastion-XX]
toolforge webservice logs -f
- Verify that the tool is up and running.
Old run instructions for just Kubernetes |
---|
|
For deployment issues, you can email wikichlod.net or use Special:EmailUser/Chlod Alejandro. If you both break and fix Dispatch (and you're not User:Chlod Alejandro), you get a complimentary chocolate chip cookie.
Required environment variables
TOOLFORGE
set to1
. This informs Dispatch that it's running on Toolforge.DISPATCH_SELF_OAUTH_ACCESS_TOKEN
set to an owner-only Meta-Wiki OAuth application token.TOOL_TOOLSDB_USER
andTOOL_TOOLSDB_PASSWORD
(provided by Toolforge)TOOL_REPLICA_USER
andTOOL_REPLICA_PASSWORD
(provided by Toolforge)
Debug logs
webservice logs
provides human-readable logs, but only for log levels INFO and higher, and doesn't provide extra data in a machine-readable way. Debug logs are available as a file on the tool's working directory. Dispatch can run with or without a Toolforge NFS mount, and it will place the log files depending on how this is done. When Dispatch is NFS-mounted, logs are stored in $TOOL_DATA_DIR/.logs
.
When Dispatch is not NFS-mounted, logs are created on the container and destroyed when the pod is terminated. toolforge webservice shell
will create a new pod, which is not what you want. Instead you want to access the webservice pod:
- [tools.deputy@tools-sgebastion-XX]
kubectl get pods
- Determine the name of the pod running the webservice. It'll have the pattern
deputy-*
.
- Determine the name of the pod running the webservice. It'll have the pattern
- [tools.deputy@tools-sgebastion-XX]
kubectl exec -ti <POD> -- tail .logs/dispatch.log -f
- Print out the tail of the log with
-f
(follow).
- Print out the tail of the log with
You can adapt this based on your needs, such as dumping the entire log into a file if you need to take a closer look.
Every log file is in Bunyan JSONL format. You can use any bunyan-compatible log reader to get detailed information — significantly better than staring at JSON until your eyes burn out.
Development
Instructions on how to get a development set-up of Dispatch can be found at https://github.com/ChlodAlejandro/deputy-dispatch#contributing. Note that you will need a Toolforge account, because you'll need access to the Wiki Replicas. Attempting to run Dispatch without properly setting the database connection information up will cause any request or job requiring the databases to fail.