Jump to content

Help talk:Toolforge/Jobs framework/Archives/2025

From Wikitech
Warning! Please do not post any new comments on this page. This is a discussion archive See current discussion or the archives index.

Image name in commands

@BryanDavis: at least in my experience, what this help page says about the use of --image attribute is incorrect. Help:Toolforge/Jobs_framework#Choosing the execution runtime implies that a command like --image python3.9 would work, but in my experience it wouldn't. What would work is --image tf-python39. I see a mention of this tf-* naming format on Module:Toolforge_images/data.json which is obscure to users. I also know for a fact that tf-python37 works, while interestingly running tfj images does *not* return a python 3.7 runtime (only 3.9 and 3.11 are returned).

Can you help update this help page so it correctly references what values for the --image attrib can be used and how a user can find their list, given that tfj images seems not to return the list? huji (talk) 00:41, 12 January 2025 (UTC)Reply

@Huji: I was confused about this recently too. I updated an old job.yaml file from image: tf-php74 to image: tf-php82 but that didn't work, it needed image: php8.2 (which is listed in the images output). So I guess the tf- is the old naming? Sam Wilson 00:58, 12 January 2025 (UTC)Reply
@Samwilson: interesting. For what is worth, I have a series of jobs that all use tf-python37 and they all continue to work. huji (talk) 04:00, 12 January 2025 (UTC)Reply
PS: I just checked, and I can also run those tasks using image python 3.7 which means two things: (a) you are likely correct in that the tf-* naming is a legacy naming convention that continues to work, and (b) clearly, a python 3.7 image exists which is not listed when tfj images is run. I now worry if I am supposed to migrate my code to 3.9 or 3.11 and missed an announcement about that. huji (talk) 04:03, 12 January 2025 (UTC)Reply
@Huji: I'm not sure, but it does sort of seem like perhaps the legacy ones should be avoided if possible. Sam Wilson 09:58, 12 January 2025 (UTC)Reply
You're correct in assuming tf-* is an old naming scheme. Those were deprecated when we changed both the jobs framework and the webservice tooling to use a shared data source for the available images.
toolforge jobs images only lists what we call "stable" images (which basically means it's based on a new enough Debian release so that we can rebuild the image if needed). We don't yet have a system of actually removing the old images so any jobs using them should continue working, although migrating to newer runtimes once in a while is still usually a good idea as old images don't get any (security) support anymore and newer versions tend to come up with new features and other improvements. Taavi (talk!) 10:22, 12 January 2025 (UTC)Reply

Health check control files

The current docs say that The tool main code loop includes some code to create a control file. and Because the control file was deleted by the health check, if the job is alive it should create the file again in the next loop iteration.

Am I understanding this correctly that a) the control file should be created at least every 30 seconds (or rather, just under that); and b) that if the main code loop is likely to take longer than that then it's not sufficient to create it in that loop, and the control file should be created in whatever place is going to guarantee that it's created often enough?

Sam Wilson 01:30, 12 January 2025 (UTC)Reply

If the example health check is used as is, then yes. The health check can also be implemented in some other way to make the required interval slower, for example something like find /tmp/alive -mmin -5 | grep . could be used to check that /tmp/alive has been modified in the last five minutes. Taavi (talk!) 10:07, 12 January 2025 (UTC)Reply
@Taavi: Thanks, that makes sense! I was getting stuck on the idea that because the health check runs every 10 seconds that the tool also had to work to that frequency. Sam Wilson 00:13, 13 January 2025 (UTC)Reply