Jump to content

Portal:Cloud VPS/Admin/Cinder backups

From Wikitech
Cloud-vps users should not rely on their cinder volumes being backed up. Our backup system is not 100% reliable, does not back up every volume, and backups are only held for a few days.


Most cinder and glance volumes hosted are backed up every day or two, with backups preserved for a few days.

Architecture

We are using Backy2 to back up cinder volumes. VMs are backed up (off-site) on cloudbackup2003 and cloudbackup2004.

The backup agent (wmcs-backup volumes) is run daily by a systemd timer. The intended lifespan of a given backup is set when that backup is created.

What is backed up

Each cloudbackup host has a config file, /etc/wmcs_backup_volumes.yaml, which determines which volumes are and aren't backed up. New projects are backed up be default (because of the ALLOTHERS keyword).

Restoring

Backy2 can restore volumes straight into the Ceph pool, and a restore can replace an existing cinder volume. If you're restoring into an existing cinder volume it is highly recommended to unmount and detach the volume before restoring.

Some of this will be done on a cloudcontrol node and some on the backup node that contains the backup. Volumes should have the same rbd id as they have in cinder.

  • First, let's find the cinder pool:
  • root@cloudcontrol1007:~# ceph osd pool ls
    eqiad1-compute
    eqiad1-glance-images
    eqiad1-cinder               <- it's this one
    device_health_metrics
    .rgw.root
    default.rgw.log
    default.rgw.control
    default.rgw.meta
    default.rgw.buckets.index
    default.rgw.buckets.data
    default.rgw.buckets.non-ec
    
  • Now, make sure rbd knows about the volume we're trying to restore:
  • root@cloudcontrol1007:~# rbd --pool eqiad1-cinder list | grep bff48003-a672-47f6-997a-462422a1a719
    volume-bff48003-a672-47f6-997a-462422a1a719
    
  • Delete (or move) the existing rbd image. This prevents filename conflict when restoring.
    • If the command complains about snapshots, use the commands below to remove the snapshots of that volume.
    root@cloudcontrol1005:~# rbd rm eqiad1-compute/volume-bff48003-a672-47f6-997a-462422a1a719
    Removing image: 100% complete...done.
    
  • Find the backup you want to restore.
  • root@cloudbackup2003:~# backy2 ls volume-bff48003-a672-47f6-997a-462422a1a719
        INFO: [backy2.logging] $ /usr/bin/backy2 ls volume-bff48003-a672-47f6-997a-462422a1a719
    +---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
    |         date        | name                                        | snapshot_name                       | size | size_bytes |                 uid                  | valid | protected | tags                |        expire       |
    +---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
    | 2024-09-06 02:17:31 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-06T02:17:24_cloudbackup2003 |  512 | 2147483648 | 2c104e68-6bf6-11ef-a5b9-84160cded950 |   1   |     0     | full_backup         | 2024-09-14 02:17:24 |
    | 2024-09-07 20:22:52 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-07T20:22:43_cloudbackup2003 |  512 | 2147483648 | f548a602-6d56-11ef-a5b9-84160cded950 |   1   |     0     | differential_backup | 2024-09-15 20:22:43 |
    | 2024-09-08 20:11:44 | volume-bff48003-a672-47f6-997a-462422a1a719 | 2024-09-08T20:11:35_cloudbackup2003 |  512 | 2147483648 | 91df1a6a-6e1e-11ef-a5b9-84160cded950 |   1   |     0     | differential_backup | 2024-09-16 20:11:35 |
    +---------------------+---------------------------------------------+-------------------------------------+------+------------+--------------------------------------+-------+-----------+---------------------+---------------------+
        INFO: [backy2.logging] Backy complete.
    
  • Note the UID of the desired image and restore
  • root@cloudbackup1003:~# backy2 restore f548a602-6d56-11ef-a5b9-84160cded950 rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950
        INFO: [backy2.logging] $ /usr/bin/backy2 restore f548a602-6d56-11ef-a5b9-84160cded950 rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-compute/ee8bd285-73ab-4981-a1f1-498b79b50e2a_disk Read Queue [          ] Write Queue [          ] (0.0% 0.0MB/sØ ETA 2m56s) 
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (23.0% 1121.1MB/sØ ETA 3s) 
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (28.2% 1092.6MB/sØ ETA 5s) 
        INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-cinder/volume-f548a602-6d56-11ef-a5b9-84160cded950 Read Queue [==========] Write Queue [==========] (34.1% 1087.6MB/sØ ETA 6s) 
    <etc>
    
  • Now you should be ready to re-mount and use the cinder volume.
  • Restoring a volume that has been previously deleted is a bit messier. There's probably a better way to do this, but the current (tested) procedure is to create a new empty cinder volume of the proper size, and then restore into that volume according to the above directions.

    Restoring after 'openstack server delete'

    We have rescued at least one VM from the void after an accidental deletion. The process involves creating a new 'host' VM with the same name (so that dns, neutron, etc are hooked up properly) and then overlaying the disk image of then new host with the restored backup.

    This may be possible, with the following caveats

    • Backups are only preserved for 7 days, so if the deletion is noticed weeks or months later it is probably too late.
    • The restored VM will lose much of its openstack state: it will have a new IP address, forget its security groups, and most likely need its puppet config replaced in Horizon.
    • If the VM predated the move from .eqiad.wmflabs to .eqiad1.wikimedia.cloud, the new VM will only be present under the new domain, eqiad1.wikimedia.cloud.

    Here are the steps for rescue:

    1. Locate the VM in the nova database
    2. # mysql -u root nova_eqiad1
      [nova_eqiad1]> SELECT hostname, id, image_ref, instance_type_id FROM instances WHERE hostname LIKE "<hostname>";
      
    3. Locate the flavor in the nova api database
    4. # mysql -u root nova_api_eqiad1;
      [nova_api_eqiad1]> SELECT name, ID FROM flavors WHERE id='<instance_type_id from above>';
      
    5. Create the new host VM
    6. # OS_PROJECT_ID=<project> openstack server create --nic net-id=7425e328-560c-4f00-8e99-706f3fb90bb4  --flavor <flavor_id_from_above> --image <image_ref_from_above> <hostname>
      
    7. Proceed with the #Restoring steps from above
    8. Confirm puppet runs on the restored VM
    9. Add security groups, floating IPs, etc. as needed in Horizon

    Restoring a lost Glance image

    Glance images are backed up on cloudcontrol nodes: each image is backed up on every node. Restoring is similar to the process for instances, but Glance accesses a snapshot rather than the primary file so there's an extra step. In this example we are restoring an image with id '06cf27ba-bed2-48c7-af2b-2abdfa65463c'.

    1. Find the backup you want to restore.
    2. root@cloudcontrol1005:~# backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
          INFO: [backy2.logging] $ /usr/bin/backy2 ls 06cf27ba-bed2-48c7-af2b-2abdfa65463c
      +---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
      |         date        | name                                 | snapshot_name       | size |  size_bytes |                 uid                  | valid | protected | tags                       |        expire       |
      +---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
      | 2020-10-20 16:00:03 | 06cf27ba-bed2-48c7-af2b-2abdfa65463c | 2020-10-20T16:00:02 | 4864 | 20401094656 | 508686ba-12ed-11eb-a7f5-4cd98fc4a649 |   1   |     0     | b_daily,b_monthly,b_weekly | 2020-10-27 00:00:00 |
      +---------------------+--------------------------------------+---------------------+------+-------------+--------------------------------------+-------+-----------+----------------------------+---------------------+
          INFO: [backy2.logging] Backy complete.
      
    3. Note the UID of the desired image and restore
    4. root@cloudcontrol1005:~# backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
          INFO: [backy2.logging] $ /usr/bin/backy2 restore 508686ba-12ed-11eb-a7f5-4cd98fc4a649 rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c
          INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [          ] Write Queue [          ] (0.0% 0.0MB/sØ ETA 2m57s) 
          INFO: [backy2.logging] Restore phase 1/2 (sparse) to rbd://eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c Read Queue [==========] Write Queue [==========] (9.4% 244.1MB/sØ ETA 11s) 
      <etc>
      
    5. Create a snapshot named 'snap' for Glance to access
    6. root@cloudcontrol1005:~# rbd snap create eqiad1-glance-images/06cf27ba-bed2-48c7-af2b-2abdfa65463c@snap