Jump to content

Portal:Toolforge/Admin/Kubernetes/Upgrading Kubernetes

From Wikitech

This document describes the procedure to upgrade the K8s version in the Toolforge K8s cluster, the cluster setup is described in Portal:Toolforge/Admin/Kubernetes/Deploying.

Create the upgrade tasks

If there's not already one, create a new upgrade task in Phabricator, with the following template:

Upgrade procedure: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Upgrading_Kubernetes

Refer to the link above for the detailed procedure, and update the checkboxes as you complete them.

### Before upgrading

[] Check Kubernetes [changelog](https://kubernetes.io/blog/link-to-release...)
[] Announce user-facing changes
[] Prepare the new APT packages
[] Upgrade Toolforge components
[] Test new k8s version in lima-kilo

### Upgrade toolsbeta cluster

{insert link to subtask}

### Upgrade tools cluster

{insert link to subtask}

### After upgrading

[] Upgrade lima-kilo
[] Upgrade Toolforge components (optional)

Then create two subtasks for toolsbeta and tools, using the same template:

Upgrade procedure: https://wikitech.wikimedia.org/wiki/Portal:Toolforge/Admin/Kubernetes/Upgrading_Kubernetes

Refer to the link above for the detailed procedure, and update the checkboxes as you complete them.

If multiple people are working on the upgrade, you can copy the checklist to an Etherpad for easier collaborative editing.

Use this command from a toolforge control node to quickly generate a list of nodes:
```
for node in $(kubectl get nodes -o json | jq '.items[].metadata.name' -r); do echo "  - [] $node"; done
```

- [] Run functional tests
- [] Add a silence in alertmanager
- [] Update IRC topic (only for "tools" cluster)
- [] Run prepare_upgrade cookbook
- [] Upgrade control nodes
  - [] node-1
  - [] ...
- [] Upgrade worker nodes
  - [] worker-node-1
  - [] ...
- [] Upgrade ingress nodes
  - [] ingress-node-1
  - [] ...
- [] Upgrade kubectl on bastions
- [] Check everything looks good
- [] Remove the silence in alertmanager
- [] Revert IRC topic change (only for "tools" cluster)

Each step of the procedure is described in detail below.

Before upgrading

Check Kubernetes changelog

Read through the Kubernetes upstream release notes and changelog for the release we're upgrading to.

Also, look at the deprecated API call dashboard for the target version. It does not tell what is making those requests, but tells if they exist. (It might be coming from inside the control plane!)

Announce user-facing changes

If there are major user-facing changes, send an email to cloud-announce to inform them of the upcoming upgrade. Then send a follow-up email when a date is scheduled for the actual production upgrade.

Prepare the new APT packages

We mirror the Kubernetes Apt repository to the Wikimedia Apt repository, in a component named thirdparty/kubeadm-k8s-X-YY. The name “kubeadm-k8s” indicates that it contains packages for a kubeadm-based installation of Kubernetes.

You can copy-paste the existing component version and adjust the version number. Remember to also update the thirdparty/helm3 stanza to point to the new component.

Example patch: https://gerrit.wikimedia.org/r/c/operations/puppet/+/1126054

Note: if you don’t have +2 rights on operations/puppet, ask someone in the team to merge the patch for you.

After the patch is merged, ssh to apt1002.wikimedia.org and clean up the old versions that have been removed (if any):

reprepro --delete clearvanished

The new components should be fetched automatically after a few minutes, but if they don’t appear you can update them manually:

reprepro --noskipold --component thirdparty/kubeadm-k8s-1-XX update bookworm-wikimedia

You can check if the new packages are available at: https://apt.wikimedia.org/wikimedia/pool/thirdparty/

Etcd

Etcd is installed using the stock debian package. We currently run etcd 3.3.25 which is older than the version recommended by Kubernetes:

The minimum recommended etcd versions to run in production are 3.4.22+ and 3.5.6+.

Upgrading the etcd nodes to Debian Bookworm (tracked in phab:T361237) will get us to 3.4.23.

Upgrade Toolforge components

All Toolforge components are listed in the toolforge-deploy repo, at https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy/-/tree/main/components

Some components were developed by us, while others are third-party components (more details below).

If one or more components are not compatible with the new version of K8s, you should upgrade those components to a compatible version.

If the current version of a component is already compatible with the new version of K8s, but there is a more recent version available that is also compatible, you can take the chance to upgrade it anyway.

Components we developed

Most components were developed by us. We don’t maintain a compatibility registry, but we should check if any upgrade is needed, for example some components include a K8s client library that should be kept in sync with the upstream version.

Go/Python client libraries are usually compatible with multiple K8s versions, but we try to keep them in sync, so for example you should upgrade references to k8s.io/api to match the new k8s version.

List of client libraries to check:

Third-party components

Components we import from upstream Helm charts, like Calico or Kyverno, usually provide a compatibility matrix that tells you which K8s versions are compatible with each component version.

To simplify this, we added a “kubeVersion” property to each component in the toolforge-deploy repo, you can find all of them with “git grep kubeVersion”.

If you find any component that is currently limited to the current version (e.g. “<=1.28” and you want to upgrade to 1.29), you should upgrade that component to a compatible version, and update the “kubeVersion” property accordingly. Testing the new versions in lima-kilo Once you have created MRs in the component repositories and in toolforge-deploy, you can test the new versions of the components in lima-kilo:

$ ./start-devenv.sh --toolforge-deploy-branch {your-new-branch}
$ limactl shell lima-kilo
(lima-kilo)$ cd ~/toolforge-deploy
(lima-kilo)$ ./deploy.sh <component> # Repeat for all components you need to upgrade, the script will ask you to select one of the open MRs.

Check that the new versions of the components are installed with “toolforge_get_versions.sh”.

Also check there are no pods failing with “CrashLoopBackoff” or other errors (you can use “kubectl get pods -A”, or “k9s”).

If k8s is failing to pull the new version of a component, you might need to import the container image into our registry.

Run the functional tests in lima-kilo and check they are all green.

Once all the components are upgraded and working, it’s time to upgrade k8s in lima-kilo. Test new k8s version in lima-kilo Create a merge request in the lima-kilo repo, updating the Kubernetes version and also the versions of kind, helm, helmfile.

K8s is deployed in lima-kilo using kind, and it’s hard to upgrade the k8s version in-place The best test we can do is recreating the VM from scratch using the new K8s version. You should do two tests: Update all 3 control nodes in kind.yaml, but keep the worker node on the old version. Recreate the vm with ./start-devenv.sh –ha and run the functional tests Update also the worker node in kind.yaml, recreate the vm again with ./start-devenv.sh –ha and run the functional tests again

Don’t merge these changes to the main branch of lima-kilo yet, you will do it at the very end of the upgrade (see the “Upgrade lima-kilo” step below).

Upgrading toolsbeta & tools

Follow these steps on the staging environment first (toolsbeta), then repeat the same steps on the production environment (tools).

Run functional tests

Ssh into the bastion, clone toolforge-deploy and run the functional-tests.

It’s useful to run them in a loop so we can detect when/if things start failing and investigate.

It is ok to get some failures during the upgrade as pods get evicted and rescheduled but this should go away during subsequent loops of the test run.

$ ssh (bastion node)
(bastion node)$ git clone https://gitlab.wikimedia.org/repos/cloud/toolforge/toolforge-deploy.git
(bastion node)$ while true; do toolforge-deploy/utils/run_functional_tests.sh -r; done

Add a silence in alertmanager

Go to https://alerts.wikimedia.org/ and click the bell icon on top right.

Create a new silence with:

  • cluster=wmcloud.org
  • team=wmcs
  • project={project name, toolsbeta or tools}
  • add a few hours of duration
  • add the phab task to the comment
  • click Preview and Submit

Update IRC topic (only for “tools” cluster)

Update the IRC topic on #wikimedia-cloud from "Status: Ok" to "Status: upgrading Toolforge k8s" (you can use !status {newstatus} if you have the required IRC permissions)

Run prepare_upgrade cookbook

The prepare_upgrade cookbook will disable puppet on all k8s nodes and update two project-puppet hiera keys:

  • profile::wmcs::kubeadm::component
  • profile::wmcs::kubeadm::kubernetes_version
cloudcumin1001:~$ sudo cookbook wmcs.toolforge.k8s.prepare_upgrade --cluster-name (toolsbeta|tools) --src-version OLD_VERSION --dst-version NEW_VERSION --task-id Txxxxxx

Upgrade control nodes

Run the upgrade cookbook for each control node:

cloudcumin1001:~$ sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id Txxxxxx --src-version OLD_VERSION --dst-version NEW_VERSION --cluster-name (toolsbeta|tools) --hostname <control_node_name>

Note: On the first control node, the cookbook will ask you to approve the upgrade plan. You should save this in case it's needed for later troubleshooting.

Now wait a few minutes until the cookbook finishes. Check that all control plane pods (scheduler, apiserver and controller-manager) start up, do not start crash looping and don't have any errors in their logs. See #Troubleshooting if they do.

(control node)$ sudo kubectl get pods -n kube-system

Find the active haproxy node (they are using keepalived, ssh to both and check which one has 2 IP addresses). Depool other control nodes from haproxy, so that all traffic goes to the control node you have just upgraded:

# depool
(proxy node)$ sudo puppet agent --disable "<user>: k8s upgrade"
(proxy node)$ sudo vim /etc/haproxy/conf.d/k8s-api-servers.cfg
(proxy node)$ sudo systemctl reload haproxy

# check config
(proxy node)$ echo "show stat" | sudo socat stdio /run/haproxy/haproxy.sock | grep k8s-api

Check that the functional tests are still passing, then repool all nodes:

# repool
(proxy node)$ sudo puppet agent --enable
(proxy node)$ sudo run-puppet-agent
(proxy node)$ sudo systemctl reload haproxy

# check config
(proxy node)$ echo "show stat" | sudo socat stdio /run/haproxy/haproxy.sock | grep k8s-api

Upgrade worker nodes

You now need to run the wmcs.toolforge.k8s.worker.upgrade cookbook for each worker node. The currently recommended way is to split the list of normal and NFS workers into two or three chunks, then make that many shell scripts that call the upgrade cookbook for each node in the chunk. Start those scripts in separate screen/tmux tabs.

sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id Txxxxxx --src-version OLD_VERSION --dst-version NEW_VERSION --cluster-name (toolsbeta|tools) --hostname <worker_node_name>

Upgrade ingress nodes

The ingress nodes are similar to the worker nodes but they need some special treatment:

On a Toolforge bastion, run kubectl sudo -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=2 to prevent an ingress controller from being scheduled on a regular node. Ingress pods take a while to evict. It should be safe to upgrade the ingress nodes in parallel with the normal worker nodes, using the same “worker.upgrade” cookbook When done, run kubectl sudo -n ingress-nginx-gen2 scale deployment ingress-nginx-gen2-controller --replicas=3 to return the cluster to normal operation.

sudo cookbook wmcs.toolforge.k8s.worker.upgrade --task-id Txxxxxx --src-version OLD_VERSION --dst-version NEW_VERSION --cluster-name (toolsbeta|tools) --hostname <worker_node_name>

Upgrade kubectl on bastions

Kubectl needs to be manually upgraded on bastion hosts. You can use “apt full-upgrade” to make sure that all packages are at the latest available version (for example there might: be a new version of Helm):

(bastion host)$ sudo apt full-upgrade

To check kubectl is on the same version on all servers, you can use:

# toolsbeta
cloudcumin1001:~$ sudo cumin 'O{project:toolsbeta}' 'dpkg -l kubectl || true'

# tools
cloudcumin1001:~$ sudo cumin 'O{project:tools}' 'dpkg -l kubectl || true'

Check everything looks good

  • Check that all nodes are upgraded with sudo kubectl get nodes
  • Check there are no failing pods with sudo kubectl get pods -A | grep -Ev '(tool-|image-build)'
  • Run the functional tests and make sure they are all green

Remove the silence in alertmanager

You can also wait until it expires, but you can easily delete it by going to https://alerts.wikimedia.org/, clicking on the bell icon, then search for the silence using either your username in the search query. Click on the “Delete” button to remove it.

Revert IRC topic change (only for “tools” cluster)

Revert the IRC topic on #wikimedia-cloud back to "Status: Ok" (you can use !status ok if you have the required IRC permissions)

After upgrading

Upgrade lima-kilo

Now you can merge the Merge Request you created in the previous step (“Test new k8s version in lima-kilo”).

Upgrade Toolforge components (optional)

Check if there are any components that can now be upgraded to a newer version, that was not compatible with the previous k8s version but is compatible with the current one.

Troubleshooting

Permission errors after control plane upgrades

Sometimes the control plane components log error messages after upgrading a control node. Stuff like:

E0410 09:18:10.387734   	1 leaderelection.go:330] error retrieving resource lock kube-system/kube-controller-manager: leases.coordination.k8s.io "kube-controller-manager" is forbidden: User "system:kube-controller-manager" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "kube-system"

The exact cause of this is unknown. Some theories include a race condition in which the controller-manager pods starts before the api-server.

Try:

  • a VM reboot
  • if didn't work, a manual restart of the affected static pod (copy out the file from /etc/kubernetes/manifests/, wait for the pod to disappear, then put the file back in the same place)

Sub pages