News/Toolforge Grid Engine deprecation
This page contains information about the deprecation and removal of the Toolforge Grid Engine platform.
What is changing?
The Grid Engine cluster is being decommissioned in accordance with the timeline on this page.
The Toolforge admins are asking tool maintainers to move tools off the grid and report any blocking issues. This work is being tracked on the Phabricator workboard.
Timeline
- Oct-Dec 2021: Done Release the Toolforge Jobs Framework. Continue working on Toolforge buildpacks. Migrate Son of Grid Engine to Debian Buster.
- Oct-Dec 2022: Done Ask community to begin migrating tools. Collect blocking issues.
- Jan-Mar 2023: Done Add features to support identified blocking issues. Explore Kubernetes a service as potential migration path. Tool migrations continue.
- Apr-Jun 2023: Done Toolforge buildpacks beta. See T267374. Tool migrations continue.
- Jul-Sep 2023: Done Toolforge buildpacks multipack support work. See T325799. Tool migrations continue.
- Oct-Dec 2023: Done All Tools are now able to be migrated. Tool migrations continue.
- November 2023: Done Kickoff grid shutdown process. Notify individual maintainers who still have tools on the grid of shutdown timelines via email, cloud-announce mailing list, and talk pages.
- 2023-12-14: Done Tools owned by unresponsive or unreachable maintainers will be stopped. See unreached tools list for what tools were stopped. FAQ contains more information: What_exactly_is_happening_on_2023-12-14?
- Jan-Mar 2024: Done Migrations complete, the grid is stopped, and finally the grid infrastructure is deleted
- 2024-02-14: Done All tools still running anything on the grid will be stopped. Tools that have an active maintainer and a clear plan for migrating can request in the tool-specific migration task for the tool to not be stopped/be re-enabled (although they will be shutdown again if they miss the 2024-03-14 deadline).
- 2024-03-14: Done Grid infrastructure is shut down and deleted. Tools that were not migrated in time can no longer run, but their files will remain on the Toolforge servers.
FAQ
How can I track tool migrations?
grid-deprecation.toolforge.org will show the number of tools still running on grid engine, as well as specifics about the tools and jobs they are running.
What exactly is happening on 2023-12-14?
Think of this as an intentional outage for tools whose maintainers haven't been reachable. Tools that have no plan or communication will be stopped on this date. The tool being down should alert users and maintainers what is happening if prior communication has not yet reached them. The tools will only be stopped, not deleted, and can be restarted if contact is made.
Why is this happening?
Tools need to have a migration plan for when the grid shuts down in on 2024-02-14. As all other methods have failed to reach maintainers of these tools, the hope is that turning them off will raise awareness about what is happening with enough time to still make a plan and migrate before the grid shutdown date on 2024-02-14. For users of these tools, it will also make them aware the tool they are using will be shut down in the future if no action is taken. This will allow users to plan, ask for help, and get support. It will also provide time to help find new maintainers for these tools.
How can I know if a tool I maintain or use has been tagged as having an unreachable maintainer?
We are tracking these on the unreached tools list.
Any grid-disabled tools will have a TOOL_DISABLED
file in the tool directory. Tools that were running web services on the grid prior to being shut down will also display a message that explains the shutdown when someone tries to load the web service.
If I use a tool that has been shut down, what should I do?
Contact the maintainer if possible. Share what's happening on the associated phabricator ticket with that tool (See the Phabricator workboard.). If the tool is unmaintained, and you'd like to take over maintenance, follow the abandoned tool policy. If that's not possible make plans to stop using the tool by the grid shutdown date of 2024-02-14. The tool can be restarted to accommodate your needs, however, it will still be shutdown on 2024-02-14 if not migrated off the grid by that date.
If I maintain a tool that has been shut down, what should I do?
Reach out on the phabricator ticket for your tool detailing plans for migration, deletion, or if you need further help or support to develop plans. Grid access for the tool can be re-enabled upon request if you are planning to migrate and continue maintaining the tool.
What happens to crontabs for tools that have been shut down?
The crontab file will be archived to a file called crontab.grid_stopped
in the tool home directory. If a tool is re-enabled, the crontab will be restored to the cron server.
Note that the Jobs framework built-in scheduling functionality will replace crontab
support entirely.
How can I help?
- Help with tool migrations. Some maintainers have specifically asked for help in migrating. See Phabricator for the list.
Are crontab
and jlocal
going away too?
Yes, these both are grid-specific tools. The Toolforge Jobs framework has built-in scheduling capability which makes crontab
obsolete, and any jlocal
use cases should be obsolete due to the increased reliability that Kubernetes brings.
What should I do?
You have a couple of options:
- migrate your Toolforge tool to Toolforge Kubernetes.
- migrating web services
- migrating jobs
- simply delete your tool, in case it is not used anymore.
Use case continuity
The following table tracks use case continuity.
Feature | Grid Engine | Kubernetes | Comment |
---|---|---|---|
job scheduling | jsub or jstart | Toolforge jobs One off jobs or continuous jobs |
Example:
From GridEngine $ crontab -e
5 * * * * jsub -once -N name-of-tool php $HOME/user/bot.php >/dev/null 2>&1
To Kubernetes $ toolforge-jobs run name-of-tool --command "php ./user/bot.php" --image php8.2 --schedule "@hourly"
|
web services | webservice | specify an image and 'kubernetes' as the backend | Example:
From GridEngine $ webservice --backend=gridengine start
To Kubernetes $ webservice stop
$ webservice --backend=kubernetes php8.2 start
|
Multi-language tools | Native | Toolforge buildpacks | Some single language tools will need updated or new images (like dotnet) |
Why are we doing this?
As outlined in our series of blog posts, Toolforge is powered by two different backend engines, Kubernetes and Grid Engine. These two backends have traditionally offered different features for tool developers. But as time moves forward weâve learnt that Kubernetes is the future.
See more for a detailed explanation.
Solutions to common problems
Rebuild virtualenv for Python users / python3: not found / ModuleNotFoundError: No module named '...'
Python virtual environments ("venvs") are tied to the underlying system where they are running. Because of that, you will need to delete and re-create your virtual environments using these instructions.
Tools needing multiple language runtimes
You can build an image for your tool with the dependencies required.
Mono container
Using mono? See discussion on a Mono specific container phab:T311466
Requires a system library or tool to be present
You can build an image for your tool with the dependencies required.
Pywikibot scripts
- See Help:Toolforge/Running Pywikibot scripts on how to easily run default Pywikibot scripts using the jobs framework.
- The advanced Pywikibot tutorial explains how to create and use a virtual environment to run custom Pywikibot scripts and default scripts that need to be customized with python code (for example user-fixes.py).
Delete a tool
Some tools were experiments that are done, others were made obsolete by other tools, some are just things that the original maintainer is tired of caring for. Maintainers can mark their tools for deletion using the "Disable tool" button on the tool's detail page on https://toolsadmin.wikimedia.org/. Disabling a tool will immediately stop any running jobs including webservices and prevent maintainers from logging in as the tool. Disabled tools are archived and deleted after 40 days. Disabled tools can be re-enabled at any time prior to being archived and deleted.
"Your webservice is not running" from `webservice status` after migrating
If webservice status
says "Your webservice is not running" after you have started it on the Kubernetes backend, you may have a $HOME/service.template
file containing "backend: gridengine". Try removing your $HOME/service.template
file or possibly better yet updating it to list the new backend and type that your tool needs to run on Kubernetes.
See also
- Wikimedia Techblog: Toolforge and Grid Engine
- Wikimedia Techblog: Toolforge GridEngine Debian 10 Buster migration
- Wikimedia Techblog: Toolforge Jobs Framework
- List of tools still running on the grid engine
- Disabled tools pending deletion
Communication and support
Support and administration of the WMCS resources is provided by the Wikimedia Foundation Cloud Services team and Wikimedia movement volunteers. Please reach out with questions and join the conversation:
- Chat in real time in the IRC channel #wikimedia-cloud connect or the bridged Telegram group
- Discuss via email after you have subscribed to the cloud@ mailing list
- Subscribe to the cloud-announce@ mailing list (all messages are also mirrored to the cloud@ list)
- Read the News wiki page
Use a subproject of the #Cloud-Services Phabricator project to track confirmed bug reports and feature requests about the Cloud Services infrastructure itself
Read the Cloud Services Blog (for the broader Wikimedia movement, see the Wikimedia Technical Blog)