Data Platform/Systems/Ceph/Upgrading
Ceph Cluster Upgrades
We do not use Cephadm for this cluster, so the ceph orch upgrade
command is not available to us.
Therefore, we must define our own best practices for upgrades.
The best reference we have is the staggered upgrade path, as described here: https://docs.ceph.com/en/quincy/cephadm/upgrade/#staggered-upgrade
This advises upgrading the components in the following order:
mgr -> mon -> crash -> osd -> mds -> rgw -> rbd-mirror -> cephfs-mirror -> iscsi -> nfs
Monitor the cluster throughout the process
Open a terminal to one of the cephosd100[1-5]
servers and execute the command sudo ceph health -w
This should show output that is similar to the following:
btullis@cephosd1001:~$ sudo ceph health -w
cluster:
id: 6d4278e1-ea45-4d29-86fe-85b44c150813
health: HEALTH_OK
services:
mon: 5 daemons, quorum cephosd1001,cephosd1002,cephosd1003,cephosd1004,cephosd1005 (age 4M)
mgr: cephosd1003(active, since 3w), standbys: cephosd1002, cephosd1005, cephosd1004, cephosd1001
mds: 3/3 daemons up, 2 standby
osd: 100 osds: 100 up (since 4M), 100 in (since 11M)
rgw: 5 daemons active (5 hosts, 1 zones)
data:
volumes: 3/3 healthy
pools: 17 pools, 4481 pgs
objects: 1.02M objects, 583 GiB
usage: 30 TiB used, 1.1 PiB / 1.1 PiB avail
pgs: 4481 active+clean
io:
client: 1.1 MiB/s rd, 3.6 MiB/s wr, 34 op/s rd, 76 op/s wr
You can also keep an eye on the Ceph cluster health dashboard: https://grafana.wikimedia.org/goto/wmqNOroHg?orgId=1
Update the packages on the APT repository
The first thing to do is to sync the external repository with Reprepro. This should be done on the currently active APT repository server, which is currently apt1002.wikimedia.org
.
sudo -i reprepro --noskipold -C thirdparty/ceph-reef update bullseye-wikimedia
sudo -i reprepro --noskipold -C thirdparty/ceph-reef update bookworm-wikimedia
We can then either wait 24 hours, or force a package update on the affected hosts:
sudo cumin A:cephosd 'apt update'
Distribute the packages to the Ceph servers
The packages do not automatically restart the daemons, so we can use Debdeploy to manage the package deployment.
We can verify which package version is available for install with apt-cache
btullis@cephosd1001:~$ apt-cache policy ceph-common
ceph-common:
Installed: 18.2.2-1~bpo12+1
Candidate: 18.2.4-1~bpo12+1
Version table:
18.2.4-1~bpo12+1 1003
1003 http://apt.wikimedia.org/wikimedia bookworm-wikimedia/thirdparty/ceph-reef amd64 Packages
*** 18.2.2-1~bpo12+1 100
100 /var/lib/dpkg/status
16.2.15+ds-0+deb12u1 500
500 http://mirrors.wikimedia.org/debian bookworm/main amd64 Packages
500 http://security.debian.org/debian-security bookworm-security/main amd64 Packages
Now generate a debdeploy spec. We can use the type library as no daemons are restarted.
btullis@cumin1002:~$ generate-debdeploy-spec -U library --comment T389184 ceph
<output trimmed for brevity>
Please enter the version of ceph fixed in bookworm. Leave blank if no fix is available/required for bookworm.
>18.2.4-1~bpo12+1
Please enter the version of ceph fixed in bullseye. Leave blank if no fix is available/required for bullseye.
>18.2.4-1~bpo12+1
Please enter the version of ceph fixed in buster. Leave blank if no fix is available/required for buster.
>
<output trimmed for brevity>
Spec file created as 2025-04-01-ceph.yaml
We can distribute the packages to a single host with debdeploy
btullis@cumin1002:~$ sudo debdeploy deploy -u 2025-04-01-ceph.yaml -Q cephosd1001.eqiad.wmnet
Rolling out ceph:
Library update, several services might need to be restarted
ceph-base was updated: 18.2.2-1~bpo12+1 -> 18.2.4-1~bpo12+1
cephosd1001.eqiad.wmnet (1 hosts)
<output trimmed for brevity>
ceph-mon was updated: 18.2.2-1~bpo12+1 -> 18.2.4-1~bpo12+1
cephosd1001.eqiad.wmnet (1 hosts)
After this, we can verify that no packages were restarted by checking with systemctl status
btullis@cephosd1001:~$ systemctl status ceph*|grep -B1 Active
Loaded: loaded (/lib/systemd/system/ceph-mds.target; enabled; preset: enabled)
Active: active since Mon 2024-11-25 12:34:38 UTC; 4 months 5 days ago
<output trimmed for brevity>
Loaded: loaded (/lib/systemd/system/ceph-crash.service; enabled; preset: enabled)
Active: active (running) since Mon 2024-11-25 12:34:38 UTC; 4 months 5 days ago
We can then distribute the packages to the remaining hosts with:
sudo debdeploy deploy -u 2025-04-01-ceph.yaml -s cephosd
Restart the ceph-mgr services
On a single host, check the status of the ceph-mgr processes with:
systemctl status --with-dependencies --after ceph-mgr.target
There should be a ceph-mgr@$HOSTNAME.service
unit running and shown as active.
Restart the ceph-mgr.target
and check that it starts successfully.
btullis@cephosd1001:~$ systemctl status ceph-mgr@cephosd1001.service
● ceph-mgr@cephosd1001.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-04-01 21:02:48 UTC; 1min 8s ago
Main PID: 1331027 (ceph-mgr)
Tasks: 165 (limit: 308998)
Memory: 487.2M
CPU: 8.659s
CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@cephosd1001.service
└─1331027 /usr/bin/ceph-mgr -f --cluster ceph --id cephosd1001 --setuser ceph --setgroup ceph
The ceph health -w monitor should also show that the manager daemon has restarted and reconnected.
2025-04-01T21:00:00.000205+0000 mon.cephosd1001 [INF] overall HEALTH_OK
2025-04-01T21:02:51.837589+0000 mon.cephosd1001 [INF] Active manager daemon cephosd1003 restarted
2025-04-01T21:02:51.840883+0000 mon.cephosd1001 [INF] Activating manager daemon cephosd1003
2025-04-01T21:02:51.920379+0000 mon.cephosd1001 [INF] Manager daemon cephosd1003 is now available
We can also check the version.
btullis@cephosd1001:~$ sudo ceph tell mgr version
{
"version": "18.2.4",
"release": "reef",
"release_type": "stable"
}
If there are no error messages shown, we can now restart the ceph-mgr.target
units on the other servers with a Cumin command like this:
sudo cumin -b 1 -s 15 'A:cephosd and not D{cephosd1001.eqiad.wmnet}' 'systemctl restart ceph-mgr.target'
This will restart each of the four remaining ceph-mgr daemons with a 15 second gap in between them.
Check the ceph health -w output and make sure that the cluster is healthy before proceeding.
Restart the ceph-mon services
In a similar way to the mgr services, we will restart a single mon daemon. If it is stable, then we can use a cumin command to restart the others.
Unlike the mgr services, the mon services are all active, so we can check the versions of all mon daemons as follows:
btullis@cephosd1003:~$ sudo ceph tell mon.* version
mon.cephosd1001: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1002: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1003: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1004: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1005: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
Choose a single host again and restart its mon service, preferably using the ceph-mon.target
unit.
btullis@cephosd1001:~$ sudo systemctl restart ceph-mon.target
btullis@cephosd1001:~$ echo $?
0
Check the status of the units and look for any errors.
btullis@cephosd1001:~$ systemctl status --with-dependencies --after ceph-mon.target
● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; preset: enabled)
Active: active since Tue 2025-04-01 21:20:41 UTC; 43s ago
Apr 01 21:20:41 cephosd1001 systemd[1]: Reached target ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once.
● ceph-mon@cephosd1001.service - Ceph cluster monitor daemon
Loaded: loaded (/lib/systemd/system/ceph-mon@.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-04-01 21:20:41 UTC; 43s ago
Main PID: 1051270 (ceph-mon)
Tasks: 25
Memory: 202.6M
CPU: 2.722s
CGroup: /system.slice/system-ceph\x2dmon.slice/ceph-mon@cephosd1001.service
└─1051270 /usr/bin/ceph-mon -f --cluster ceph --id cephosd1001 --setuser ceph --setgroup ceph
Check the version numbers again:
btullis@cephosd1001:~$ sudo ceph tell mon.* version
mon.cephosd1001: {
"version": "18.2.4",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1002: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1003: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1004: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
mon.cephosd1005: {
"version": "18.2.2",
"release": "reef",
"release_type": "stable"
}
Check the ceph health -w
output again, to make sure that the cluster health is OK. If it is, we can proceed to restart the remaining mon daemons.
This time, we will use a slightly longer gap in between them.
sudo cumin -b 1 -s 30 'A:cephosd and not D{cephosd1001.eqiad.wmnet}' 'systemctl restart ceph-mon.target'
The updated mon versions will also be shown on the Grafana dashboard: https://grafana.wikimedia.org/goto/nngxc9oNR?orgId=1
Restart the ceph-crash servers
The next component to be upgraded will be the ceph-crash daemons. These differ from the other services in that they are not shown on the ceph health output, so we can only use the systemctl status
output.
Once again, restart the services on a test host first.
btullis@cephosd1001:~$ sudo systemctl restart ceph-crash.service
btullis@cephosd1001:~$ echo $?
0
btullis@cephosd1001:~$ systemctl status ceph-crash.service
● ceph-crash.service - Ceph crash dump collector
Loaded: loaded (/lib/systemd/system/ceph-crash.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-04-01 21:37:01 UTC; 11s ago
Main PID: 1057734 (ceph-crash)
Tasks: 1 (limit: 308998)
Memory: 6.6M
CPU: 873ms
CGroup: /system.slice/ceph-crash.service
└─1057734 /usr/bin/python3 /usr/bin/ceph-crash
If all is well, then repeat the process with cumin.
sudo cumin -b 1 -s 15 'A:cephosd and not D{cephosd1001.eqiad.wmnet}' 'systemctl restart ceph-crash.service'
Set the cluster into noout mode
This step is important because noout mode will prevent unnecessary data movement when the ceph-osd services are restarted.
The cluster will show a warning when noout mode is enabled.
From any cephosd host issue the following command.
sudo ceph osd set noout
The ceph health
output will now show the following.
btullis@cephosd1001:~$ sudo ceph health -w
cluster:
id: 6d4278e1-ea45-4d29-86fe-85b44c150813
health: HEALTH_WARN
noout flag(s) set
services:
mon: 5 daemons, quorum cephosd1001,cephosd1002,cephosd1003,cephosd1004,cephosd1005 (age 12m)
mgr: cephosd1003(active, since 29m), standbys: cephosd1001, cephosd1005, cephosd1002, cephosd1004
mds: 3/3 daemons up, 2 standby
osd: 100 osds: 100 up (since 4M), 100 in (since 11M)
flags noout
rgw: 5 daemons active (5 hosts, 1 zones)
data:
volumes: 3/3 healthy
pools: 17 pools, 4481 pgs
objects: 1.02M objects, 583 GiB
usage: 30 TiB used, 1.1 PiB / 1.1 PiB avail
pgs: 4481 active+clean
io:
client: 38 KiB/s rd, 1.8 MiB/s wr, 39 op/s rd, 90 op/s wr
Restart the ceph-osd services
We can also check the version number of all OSDs with sudo ceph tell osd.* version
Now restart all of the 20 OSD daemons on a test host with the command
sudo systemctl restart ceph-osd.target
Check that there are no failed services with:
btullis@cephosd1001:~$ systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
0 loaded units listed.
You can check the status of all ceph-osd units with:
systemctl status --with-dependencies --after ceph-osd.target
If all is well, then we can proceed to upgrade the remaining osd servers. We will pause for 3 minutes between each of the four remaining servers and monitor the cluster health.
sudo cumin -b 1 -s 180 'A:cephosd and not D{cephosd1001.eqiad.wmnet}' 'systemctl restart ceph-osd.target'
You can check the version of the osd daemons on the Ceph Cluster dashboard: https://grafana.wikimedia.org/goto/3IK6prTHR?orgId=1
Check the cluster health and if all is well, proceed.
Take the cluster out of noout mode
Now that the OSD daemons have restarted, we can take the cluster out of noout mode. This will re-enable the data movement if a degraded situation occurs and will remove the health warning.
sudo ceph osd unset noout
Restart the ceph-mds services
Check the versions of all mds components.
sudo ceph tell mds.* version
We have 5 mds daemons running, of which 3 are intended to be active at any time, since we have 3 cephfs file systems.
We can check which of these meph-mds daemons is currently active by using the command sudo ceph fs dump
and examining the output. For example.
btullis@cephosd1001:~$ sudo ceph fs dump | egrep '(Filesystem|up:active)'
dumped fsmap epoch 8205
Filesystem 'dpe' (1)
[mds.cephosd1004{0:17963228} state up:active seq 239868 addr [v2:10.64.134.12:6800/440240334,v1:10.64.134.12:6801/440240334] compat {c=[1],r=[1],i=[7ff]}]
Filesystem 'dumps' (2)
[mds.cephosd1002{0:17507916} state up:active seq 70 addr [v2:10.64.131.21:6800/3148591687,v1:10.64.131.21:6801/3148591687] compat {c=[1],r=[1],i=[7ff]}]
Filesystem 'home' (3)
[mds.cephosd1001{0:17555995} state up:active seq 56 addr [v2:10.64.130.13:6800/801572634,v1:10.64.130.13:6801/801572634] compat {c=[1],r=[1],i=[7ff]}]
In this case, we will start by restarting one of the standby mds daemons running on cephosd1003.
btullis@cephosd1003:~$ sudo systemctl restart ceph-mds.target
btullis@cephosd1003:~$ echo $?
0
btullis@cephosd1003:~$ systemctl status ceph-mds@cephosd1003.service
● ceph-mds@cephosd1003.service - Ceph metadata server daemon
Loaded: loaded (/lib/systemd/system/ceph-mds@.service; enabled; preset: enabled)
Active: active (running) since Wed 2025-04-02 08:59:45 UTC; 8s ago
Main PID: 1691134 (ceph-mds)
Tasks: 16
Memory: 16.5M
CPU: 85ms
CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds@cephosd1003.service
└─1691134 /usr/bin/ceph-mds -f --cluster ceph --id cephosd1003 --setuser ceph --setgroup ceph
Restart all of the remaining ceph-mds
services using cumin, with a 30 second pause between them.
sudo cumin -b 1 -s 30 'A:cephosd and not D{cephosd1003.eqiad.wmnet}' 'systemctl restart ceph-mds.target'
Restart the ceph-radosgw services
On any test host, check the current state of the ceph-radosgw.target
unit and its dependencies.
btullis@cephosd1001:~$ systemctl status --with-dependencies --after ceph-radosgw.target
● ceph-radosgw.target - ceph target allowing to start/stop all ceph-radosgw@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-radosgw.target; enabled; preset: enabled)
Active: active since Mon 2024-11-25 12:34:38 UTC; 4 months 6 days ago
Notice: journal has been rotated since unit was started, output may be incomplete.
● ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once
Loaded: loaded (/lib/systemd/system/ceph-mon.target; enabled; preset: enabled)
Active: active since Tue 2025-04-01 21:20:41 UTC; 13h ago
Apr 01 21:20:41 cephosd1001 systemd[1]: Reached target ceph-mon.target - ceph target allowing to start/stop all ceph-mon@.service instances at once.
● ceph-radosgw@radosgw.service - Ceph rados gateway
Loaded: loaded (/lib/systemd/system/ceph-radosgw@.service; enabled; preset: enabled)
Active: active (running) since Tue 2025-03-11 16:10:19 UTC; 3 weeks 0 days ago
Main PID: 2984021 (radosgw)
Tasks: 615
Memory: 2.5G
CPU: 3h 59min 10.864s
CGroup: /system.slice/system-ceph\x2dradosgw.slice/ceph-radosgw@radosgw.service
└─2984021 /usr/bin/radosgw -f --cluster ceph --name client.radosgw --setuser ceph --setgroup ceph
Check the http_status
of the requests passing through the server with
journalctl -u ceph-radosgw@radosgw.service -f|grep http_status
They should all be 200 although 4xx errors are acceptable. There should not be any 5xx errors. Restart the ceph-radosgw.target service on this host.
btullis@cephosd1001:~$ sudo systemctl restart ceph-radosgw.target
btullis@cephosd1001:~$ echo $?
0
Check the status of the ceph-radosgw units again.
systemctl status --with-dependencies --after ceph-radosgw.target
Check the logs again.
journalctl -u ceph-radosgw@radosgw.service -f|grep http_status
If all is well, proceed to restart the remainder of the ceph-radosgw.target units on the cluster.
sudo cumin -b 1 -s 30 'A:cephosd and not D{cephosd1001.eqiad.wmnet}' 'systemctl restart ceph-radosgw.target'
Once again, the Ceph Cluster dashboard can be used to monitor the versions of radosgw in production: https://grafana.wikimedia.org/goto/wrst_jTNg?orgId=1
Perform a rolling reboot of the cluster
At this point, all services have now been upgraded and roll-restarted, since we do not currently use the rbd-mirror
, cephfs-mirror
, iscsi
, nor nfs
daemons.
It is a good idea to perform a rolling-restart of the cluster, to ensure that everything comes up as expected during the boot process.
We have a cookbook for this and it can be launched like this:
sudo cookbook sre.ceph.roll-restart-reboot-server --alias cephosd --reason "Reboot post upgrade to latest point release" --task-id T389184 reboot