This page details information about deprecating and removing hosts running Ubuntu Trusty (14.04) as an operating system from the Toolforge infrastructure. The login bastions and Grid execution hosts are still running Trusty and must be replaced with new instances.
The Ubuntu Trusty job grid was shutdown on Monday 2019-03-25. Migration steps following the shutdown have changed slightly, so be sure to read them.
Done 2019-01-11: Availability of Debian Stretch grid announced to community
Done Week of 2019-02-04: Weekly reminders via email to tool maintainers for tools still running on Trusty
Week of 2019-03-04:
Done Daily reminders via email to tool maintainers for tools still running on Trusty
Done Switch login.tools.wmflabs.org to point to Stretch bastion
Done 2019-03-25: Shutdown Trusty grid
What should I do?
SSH to the Stretch bastion
login.tools.wmflabs.org connects to the new Debian Stretch bastion.
Move a grid engine webservice
When possible, we recommend migrating web services to Kubernetes instead of the new grid:
:# Connect to the Stretch bastion$ ssh<your-shell-name>@login.tools.wmflabs.org
:# Become your tool account$ becomeYOUR_TOOL
:# Start the webservice as a Kubernetes container rather than a grid job:# <type> is one of: php7.2, php5.6, python, python2, nodejs, golang, jdk8, ruby2, tcl$ webservice--backend=kubernetes<type>start
:# -- OR --:# Start the webservice as a Stretch grid job:# <type> is one of: lighttpd, uwsgi-python, tomcat, generic, lighttpd-plain, nodejs, uwsgi-plain$ webservice--backend=gridengine<type>start
NodeJS webservices will need to rebuild their $HOME/www/js/node_modules on the new target runtime (Stretch grid or Kubernetes).
Move a continuous job
:# Connect to the Stretch bastion$ ssh<your-shell-name>@login.tools.wmflabs.org
:# Become your tool account$ becomeYOUR_TOOL
:# Start your job on the Stretch job grid$ jstart...
The exact commands needed to start each continuous job vary greatly from tool to tool. This would be a great time to make a page of reference material for yourself and other maintainers here on Wikitech in the Tool namespace and using the Tool template if you haven't already.
Move a cron job
The crontab data for all tools which still had a cron registered on the Trusty grid were backed up to $HOME/crontab.trusty.save before the Trusty cron server was shutdown. This backup can be used to setup your crontab on the Stretch grid.
:# Connect to the Stretch bastion$ ssh<your-shell-name>@login.tools.wmflabs.org
:# Become your tool account$ becomeYOUR_TOOL
:# Load the backup of your crontab on the Stretch job grid$ crontab$HOME/crontab.trusty.save
If your workload permits, please avoid scheduling cronjobs from midnight to 3am so you're not competing with other cronjobs for system resources. That time window is currently very crowded.
What are the primary changes with moving to Stretch?
Language runtime and library versions
The vast majority of the language runtimes and libraries installed on the grid nodes are upgraded in Stretch.
Runtime
Trusty Version
Stretch Version
Python3
3.4.0
3.5.3
PHP
5.5.9
7.2
Python2
2.7.5
2.7.13
NodeJS
0.10.25
8.11.1
Perl
5.18.2
5.24.1
Java
1.7.0
11.0.1
Ruby
1.9.3
2.3.3
Mono
5.12.0
5.12.0
TCL
8.6.1
8.6.0
R
3.2.3
3.3.3
Also note that the system-installed phpunit is not going to be present due to lack of current packages for recent versions of PHP. To use phpunit, please install via composer (instructions for setting up composer are included here Help:Toolforge#Installing_MediaWiki_core)
A table of the primary packages that users are likely to notice changes in is below.
Maximum of 16 active jobs simultaneously allowed per tool user
The scheduler will hold additional job submissions in the qw (queued/waiting) until an active slot is available.
Maximum of 50 active and queued jobs simultaneously allowed per tool user
The scheduler will reject additional job submissions by exiting with a status code of 25 and writing "Unable to run job: job rejected: only 50 jobs are allowed per user (current job count: 50)" to stderr
Since the python executables and libraries are updated in stretch, local virtualenvs will need to be deleted and re-created on the new bastion for anything that runs from those virtualenvs to work. Several errors are likely to be caused by old virtualenvs with one obvious one being an unexpected ImportError.
Using a requirements file may make this simpler in many cases, if your project doesn't already use one. You can create one in your local directory by running pip freeze > requirements.txt in your tool folder with your virtualenv activated. Then later on, you can simply use pip install -r requirements.txt to install the new environment after you deleted the old virtualenv and created a new one. For more information on this option, see pip's documentation on requirements files.
Example 1: Upgrading a Trusty grid engine based tool to the Stretch grid
Follow these steps if you manually submit jobs using jsub, or if you submit jobs using a crontab.
$ ssh<your-shell-name>@login-stretch.tools.wmflabs.org
$ becomeYOUR_TOOL
$ rm-rfvenv# This will destroy the virtualenv and all libraries, so make sure you know what you will need to install later!$ virtualenvvenv
$ sourcevenv/bin/activate
$ pipinstall--upgradepip# upgrade pip itself to avoid problems with older versions$ pipinstall...# Here you'd use the requirements file syntax if you have one, or you'd manually install each needed library.
Example 2: Upgrading a uWSGI webservice into a Kubernetes container
If you are currently running your uWSGI webservice under the Grid Engine backend (i.e., webservice uwsgi-python command), and you want to upgrade to a uWSGI webservice running under Kubernetes (i.e., webservice --backend=kubernetes python command), you should rebuild your virtualenv as follows:
$ ssh<your-shell-name>@login-stretch.tools.wmflabs.org
$ becomeYOUR-TOOL
$ webservice--backend=kubernetespythonstop
$ webservice--backend=kubernetespythonshell# do not skip this step – setting up the venv directly from the bastion may result in serious performance issues, compare T214086$ rm-rfwww/python/venv/# this will destroy the virtualenv and all libraries, so make sure you know what you will need to install later!$ python3-mvenvwww/python/venv/
$ sourcewww/python/venv/bin/activate
$ pipinstall--upgradepip# upgrade pip itself to avoid problems with older versions$ pipinstall-rwww/python/src/requirements.txt# assuming your tool has a requirements.txt file$ webservice--backend=kubernetespythonstart
Example 3: Upgrading a Kubernetes uWSGI webservice
If you are already using the Kubernetes backend, there is nothing you need to do -- the container will use the same Debian Jessie-based image as before.
PyYAML fails to install in Debian Stretch Python3 virtualenv
The new bastions are using systemd resource control to restrict the amount of RAM and CPU resources that a user can consume. We do this to attempt to keep a single user from using all of the shared resources of the bastion accidentally and thus making the bastion slow for everyone. The initial limits we had set were overly restrictive and caused gcc to fail when compiling PyYAML. This has been corrected by increasing the limits.
BotPassword or OAuth grant does not work from new job grid
Bot passwords and OAuth registrations can both include allowed IP range restrictions. The defaults for both are to allow usage from any IPv4 and IPv6 address. If you have changed this when creating the bot password or OAuth consumer registration to restrict access to specific IP address ranges you may have issues using the password or OAuth consumer from the new job grid. The Cloud VPS environment is nearing the end of a process of moving from the 10.0.0.0/8 private address range that is shared with other internal servers operated by the Wikimedia Foundation to a new 172.16.0.0/21 private subnet. The new job grid is the first end-user facing portion of Toolforge to be migrated to the new range.
The allowed IP ranges for bot passwords can be changed by the owner of the account using Special:BotPasswords. Either add the 172.16.0.0/21 CIDR to the list of allowed ranges or reset them to the defaults of 0.0.0.0/0 and ::/0.
The allowed IP ranges for an OAuth consumer registration can be changed by the original proposer of the registration using Special:OAuthConsumerRegistration/list. Either add the 172.16.0.0/21 CIDR to the list of allowed ranges or reset them to the defaults of 0.0.0.0/0 and ::/0.
Lighttpd crashes on startup with message "parser failed somehow near here: (EOL)"
Look for a $HOME/error.log line similar to Duplicate array-key '.js' just prior to the parser failure error message to help you find the entry in your $HOME/.lighttpd.conf file that needs to be removed.
'webservice stop' says service is not running, but 'webservice start' says service is running
It is not completely well understood what causes webservice to become confused about the state of the process, but deleting the service.manifest file generally seems to fix the issue.
Python: redis.exceptions.ResponseError: value is not an integer or out of range
The Python Redis client made a breaking change in v3.0.0 vs older versions in renaming the prior StrictRedis class to Redis. The new behavior expects a different order of arguments for calls such as setex(). The expected order of arguments now matches the Redis protocol docs rather than the more "pythonic" order that the prior implementation used. Typically this means that you need to swap the order of the time and value arguments in your calling code. See the library documentation for more breaking changes.
Some tools were experiments that are done, others were made obsolete by other tools, some are just things that the original maintainer is tired of caring for. Maintainers can mark their tools for deletion using the "Disable tool" button on the tool's detail page on https://toolsadmin.wikimedia.org/. Disabling a tool will immediately stop any running jobs including webservices and prevent maintainers from logging in as the tool. Disabled tools are archived and deleted after 40 days. Disabled tools can be re-enabled at any time prior to being archived and deleted.
The latest official release of the Python 'oursql' package will not compile against MariaDB client libraries. See upstream bug report at https://github.com/python-oursql/oursql/issues/5. Oursql can be installed from a fork maintained at https://github.com/sqlobject/oursql, but the recommended long term solution is to migrate application code to the PyMySQL package instead.
SSH to login-stretch.tools.wmflabs.org fails with 'Permission denied (publickey)'
This is typically an issue with the newer Debian Stretch provided version of sshd on the server side refusing to authenticate an insecure or deprecated public key type. Specifically, support for DSA (ssh-dss) keys was deprecated in Openssh 7.0. If your ssh public key starts with the string "ssh-dss" you will be impacted by this. RSA keys smaller than 1024 bits are also deprecated.
First make sure that you are passing a valid key by attempting to ssh to login-trusty.tools.wmflabs.org using the same public key and username. If this also fails, the problem is likely something other than the ssh key type. Join us in #wikimedia-cloudconnect for interactive debugging help.
SSH to login-stretch.tools.wmflabs.org fails with 'Permission denied (publickey,hostbased)'
In case you face this problem, make sure to use the right shell name located on your User Preferences called **Instance shell account name**. It's supposed to be used in logging into the Toolforge server when need be, whether Trusty or Stretch.
"Unable to run job: Error reading answer list from qmaster"
Attempting to start a job with a name including non-ASCII characters using jsub, jstart, qcronsub, etc may fail with an error message written to the job's err file like "Unable to run job: Error reading answer list from qmaster". This is a known bug in Son of Grid Engine.
Ubuntu Trusty was released in April 2014, and support for it (including security updates) will cease in April 2019. We need to shut down all Trusty hosts before the end of support date to ensure that Toolforge remains a secure platform. This migration will take several months because many people still use the Trusty hosts and our users are working on tools in their spare time.
During past operating system updates we were able to create a mixed grid which contained hosts running multiple operating systems and control which was used to run each job using command line arguments to jsub and webservice. The current version of Sun Grid Engine (v6.2u5) that exists in Ubuntu Trusty is incompatible with "Son of" Grid Engine (v8.1.9) from Debian Stretch. Therefore the two grids must be entirely separate environments. Any cron jobs that exist or web services in the old grid (submitted from one of the current bastions) will not currently exist in the new grid. To schedule any job or service on the new Son of Grid Engine grid, one must log into a bastion dedicated to that grid (currently tools-sgebastion-06.tools.eqiad.wmflabs) to submit them.