Portal:Cloud VPS/Admin/Cinder backups

Cloud-vps users should not rely on their cinder volumes being backed up. Our backup system is not 100% reliable, does not back up every volume, and backups are only held for a few days.

Most cinder and glance volumes hosted are backed up every day or two, with backups preserved for a few days.

Architecture

We are using Backy2 to back up cinder volumes. VMs are backed up (off-site) on cloudbackup2003 and cloudbackup2004.

The backup agent (wmcs-backup volumes) is run daily by a systemd timer. The intended lifespan of a given backup is set when that backup is created.

What is backed up

Each cloudbackup host has a config file, /etc/wmcs_backup_volumes.yaml, which determines which volumes are and aren't backed up. New projects are backed up be default (because of the ALLOTHERS keyword).

Restoring

Backy2 can restore volumes straight into the Ceph pool, and a restore can replace an existing cinder volume. If you're restoring into an existing cinder volume it is highly recommended to unmount and detach the volume before restoring.

Some of this will be done on a cloudcontrol node and some on the backup node that contains the backup. Volumes should have the same rbd id as they have in cinder.

First, let's find the cinder pool:

root@cloudcontrol1007:~# ceph osd pool ls
eqiad1-compute
eqiad1-glance-images
eqiad1-cinder               <- it's this one
device_health_metrics
.rgw.root
default.rgw.log
default.rgw.control
default.rgw.meta
default.rgw.buckets.index
default.rgw.buckets.data
default.rgw.buckets.non-ec

Now, make sure rbd knows about the volume we're trying to restore:

root@cloudcontrol1007:~# rbd --pool eqiad1-cinder list | grep bff48003-a672-47f6-997a-462422a1a719
volume-bff48003-a672-47f6-997a-462422a1a719

Delete (or move) the existing rbd image. This prevents filename conflict when restoring.

If the command complains about snapshots, use the commands below to remove the snapshots of that volume.

root@cloudcontrol1005:~# rbd rm eqiad1-compute/volume-bff48003-a672-47f6-997a-462422a1a719
Removing image: 100% complete...done.

Find the backup you want to restore.

root@cloudbackup2003:~# backy2 ls volume-bff48003-a672-47f6-997a-462422a1a719
    INFO: [backy2.logging] $ /usr/bin/backy2 ls volume-bff48003-a672-47f6-997a-462422a1a719
+---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
|         date        | name                                        | snapshot_name                       | size | size_bytes |                 uid                  | valid | protected | tags                |        expire       |
+---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
| 2024-09-06 02:17:31 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-06T02:17:24_cloudbackup2003 |  512 | 2147483648 | 2c104e68-6bf6-11ef-a5b9-84160cded950 |   1   |     0     | full_backup         | 2024-09-14 02:17:24 |
| 2024-09-07 20:22:52 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-07T20:22:43_cloudbackup2003 |  512 | 2147483648 | f548a602-6d56-11ef-a5b9-84160cded950 |   1   |     0     | differential_backup | 2024-09-15 20:22:43 |
| 2024-09-08 20:11:44 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-08T20:11:35_cloudbackup2003 |  512 | 2147483648 | 91df1a6a-6e1e-11ef-a5b9-84160cded950 |   1   |     0     | differential_backup | 2024-09-16 20:11:35 |
+---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
    INFO: [backy2.logging] Backy complete.

Note the UID of the desired image and restore

root@cloudbackup1003:~# backy2 restore f548a602-6d56-11ef-a5b9-84160cded950 rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950
    INFO: [backy2.logging] $ /usr/bin/backy2 restore f548a602-6d56-11ef-a5b9-84160cded950 rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950
    INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk Read Queue [          ] Write Queue [          ] (0.0% 0.0MB/sØ ETA 2m56s) 
    INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (23.0% 1121.1MB/sØ ETA 3s) 
    INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (28.2% 1092.6MB/sØ ETA 5s) 
    INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (34.1% 1087.6MB/sØ ETA 6s) 
<etc>

Now you should be ready to re-mount and use the cinder volume.

Restoring a volume that has been previously deleted is a bit messier. There's probably a better way to do this, but the current (tested) procedure is to create a new empty cinder volume of the proper size, and then restore into that volume according to the above directions.

Restoring after 'openstack server delete'

We have rescued at least one VM from the void after an accidental deletion. The process involves creating a new 'host' VM with the same name (so that dns, neutron, etc are hooked up properly) and then overlaying the disk image of then new host with the restored backup.

This may be possible, with the following caveats

Backups are only preserved for 7 days, so if the deletion is noticed weeks or months later it is probably too late.
The restored VM will lose much of its openstack state: it will have a new IP address, forget its security groups, and most likely need its puppet config replaced in Horizon.
If the VM predated the move from .eqiad.wmflabs to .eqiad1.wikimedia.cloud, the new VM will only be present under the new domain, eqiad1.wikimedia.cloud.

Here are the steps for rescue:

Locate the VM in the nova database

# mysql -u root nova_eqiad1
[nova_eqiad1]> SELECT hostname, id, image_ref, instance_type_id FROM instances WHERE hostname LIKE "<hostname>";

Locate the flavor in the nova api database

# mysql -u root nova_api_eqiad1;
[nova_api_eqiad1]> SELECT name, ID FROM flavors WHERE id='<instance_type_id from above>';

Create the new host VM

# OS_PROJECT_ID=<project> openstack server create --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4  --flavor <flavor_id_from_above> --image <image_ref_from_above> <hostname>

Proceed with the #Restoring steps from above
Confirm puppet runs on the restored VM
Add security groups, floating IPs, etc. as needed in Horizon

Restoring a lost Glance image

Glance images are backed up on cloudcontrol nodes: each image is backed up on every node. Restoring is similar to the process for instances, but Glance accesses a snapshot rather than the primary file so there's an extra step. In this example we are restoring an image with id '06cf27ba-bed2-48c7-af2b-2abdfa65463c'.

Find the backup you want to restore.

root@cloudcontrol1005:~# backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
    INFO: [backy2.logging] $ /usr/bin/backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
+---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
|         date        | name                                 | snapshot_name       | size |  size_bytes |                 uid                  | valid | protected | tags                       |        expire       |
+---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
| 2020-10-20 16:00:03 | 06cf27ba-bed2-48c7-af2b-2abdfa65463c | 2020-10-20T16:00:02 | 4864 | 20401094656 | 508686ba-12ed-11eb-a7f5-4cd98fc4a649 |   1   |     0     | b_daily,b_monthly,b_weekly | 2020-10-27 00:00:00 |
+---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
    INFO: [backy2.logging] Backy complete.

Note the UID of the desired image and restore

root@cloudcontrol1005:~# backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
    INFO: [backy2.logging] $ /usr/bin/backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
    INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [          ] Write Queue [          ] (0.0% 0.0MB/sØ ETA 2m57s) 
    INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [==========] Write Queue [==========] (9.4% 244.1MB/sØ ETA 11s) 
<etc>

Create a snapshot named 'snap' for Glance to access

root@cloudcontrol1005:~# rbd snap create eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c@snap

Restoring a single file from a backup

Sometimes you want to restore a single file in a volume to an older version. You can easily do this with Backy2's FUSE live-mount.

root@cloudbackup2004:~# mkdir /mnt-backy2-fuse
root@cloudbackup2004:~# mkdir /mnt-volume

# Start "backy2 fuse" and leave it running
root@cloudbackup2004:~# backy2 fuse /mnt-backy2-fuse

# While "backy2 fuse" is running, in another terminal
root@cloudbackup2004:~# mount /mnt-backy2-fuse/by_name/volume-$VOLUME_ID/$BACKUP_ID/data /mnt-volume

Note that the mount step will take a few minutes if the volume is big. Once the mount succeeds, you'll be able to browse the content of the backed-up filesystem at /mnt-volume and extract the file you are interested in.