Jump to content

Portal:Toolforge/Admin/Monthly meeting/2024-06-25

From Wikitech

Attendees

David Caro Slavina Stefanova Bryan Davis (bd808) Francesco Negri Arturo Borrero Seyram Komla Sapaty Taavi Väänänen

Agenda

Notes

Last meeting for Taavi! 🙁
Kyverno update
ABG: Kyverno is currently in “audit” mode. The “enforce” mode will be set tomorrow by Arturo. “Enforce” will reject any pod that doesn’t comply with the policy. An email has been sent to cloud-announce. Hopefully this will be a low-impact change.
ABG: Kyverno policies are very similar to the old PSP policies. Sometimes it’s not a 1-1 match, but they are very similar.
ABG: What about folks creating a deploy manually with kubectl? Kyverno does mutations as well (automatically modifying the deployment resources to comply with the policy). So it should be transparent.
ABG: After this change, we can completely remove PSP from the cluster, and that will unblock the upgrade to the next K8s version.
ABG: Some of the components in Toolforge have a “privileged” PSP role. We need to review that to make sure we have a replacement. We only have about 10 internal components, but we need to review them. We can do this in a few days from now.
DC: What happens to deployments that were created before the migration?
ABG: I tested it this morning. I downgraded the webservice script to a very old version that was generating a deployment template without the new security attributes. Then I set the Kyverno policy to “enforce”. I would get a Kyverno policy report about the deployment not matching the policy, but the resource would not be immediately blocked. Resources that are already defined will not be immediately stopped.
DC: What if you restart the pod/deployment?
ABG: If you run “webservice restart” the deployment will be recreated, and the old version that was not matching the policy will no longer exist. Additionally, Kyverno is generating the report before the mutation.
BD: webservice restart is run occasionally but not as often as pods are restarted by the system.
DC: Should we try in toolsbeta, restarting a pod and checking what happens?
DC: Do we have numbers on how many pods are not passing the validation?
ABG: It should be simple to get that number with kubectl: at the moment with 1700 policy reports, about 50% of the toolforge accounts have an old-type resource. But this is an inaccurate count…
Next upgrade to k8s 1.25 & k8s upgrade working group
DC: We decided to create a group and upgrade once a month to catch up with the current version
DC: We said there will be 2 people per each upgrade, and people will rotate. There is a wiki page with the details: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Ongoing_Efforts/Toolforge_Upgrade_Workgroup/Upgrades_Overview
ABG: Once we get rid of PSP, there should be no other blockers for the upgrade
Authentication - quick update
DC: I’m still investigating OIDC. I’ve updated the task with what we discussed, not wanting the passwords to go through our infra. That removes the option of using Keystone directly. I’m still working on that. If you have comments, let me know or write it in the task.
DC: I’m also updating the diagrams, the OIDC flow is a bit more complicated than the others. I have not yet investigated the RadosGW-specific authentication.
Other topics
DC: I think we can start working on the components API even if authentication is not available yet.

Action items

No action items recorded