action #173347
opencoordination #161414: [epic] Improved salt based infrastructure management
Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:S
0%
Description
Motivation¶
During work on #170077 okurz found work on storage for qamaster to be error-prone due to the hardware RAID with many and old storage devices and unusual configuration with RAID6 for root device etc. Before we do more risky stuff we should ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc.
Acceptance criteria¶
- AC1: We have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc.
- AC2: The backup is reproducible and not a complete waste of space (e.g. not copy backup-data.qcow2)
Suggestions¶
- Check what is already backed up
- Follow suggestions from #173674
- Review https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/backup/rsnapshot.sls?ref_type=heads as well as https://gitlab.suse.de/qa-sle/backup-server-salt
- Add to backup what is missing, at least once manually, e.g. copy over stuff with rsync to another hardware
- Consider a "backup of backup" from backup-vm
- Probably it's better and easier to save data from within VMs but if it's easier consider backup of complete qcow files
Updated by okurz 12 days ago
- Copied from action #170077: Put more storage into qamaster "to make our lives easier in general" size:M added
Updated by okurz 12 days ago
- Copied to action #173350: Migrate VMs from qamaster to modern hypervisor solution added
Updated by okurz 8 days ago
- Copied to action #173674: qamaster-independent backup size:S added
Updated by mkittler 8 days ago
- Subject changed from Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. to Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by gpathak 1 day ago
Jenkins Data: 2.1G
Size of Backup on backup-vm: ~47G
Size of qcow images on qamaster: ~1.1T
25G /var/lib/libvirt/images/backup_system.qcow2
33G /var/lib/libvirt/images/baremetal-support.qcow2
7.8M /var/lib/libvirt/images/calendar-server.qcow2
71G /var/lib/libvirt/images/jenkins.qcow2
12G /var/lib/libvirt/images/ntlm-proxy.qcow2
501G /var/lib/libvirt/images/openqa-monitoring-data.qcow2
80G /var/lib/libvirt/images/openqa-monitoring.qcow2
200G /var/lib/libvirt/images/opensuse13.qcow2
15G /var/lib/libvirt/images/opensuse42.3.qcow2
101G /var/lib/libvirt/images/win2k19_old.qcow2
50G /var/lib/libvirt/images/win_server2k19.qcow2
1.1T total
We need approx. 1.6T of storage for taking a backup.
We have 3.8T of free space on "storage" host.
Updated by openqa_review about 17 hours ago
- Due date set to 2024-12-25
Setting due date based on mean cycle time of SUSE QE Tools
Updated by gpathak about 11 hours ago ยท Edited
Backup is done on storage host at /storage/backup/
under respective directory names
backup-vm
: Backup of/home/rsnapshot
jenkins.qa.suse.de
: Backup of Jenkins home folderqamaster
: VM Configs, qcow2 images
Updated by gpathak about 6 hours ago
- Status changed from In Progress to Feedback
- Assignee changed from gpathak to okurz
Backup is complete.
@okurz I am putting this in Feedback, please verify the backups and let me know if I missed something.
Updated by okurz about 6 hours ago
- Status changed from Feedback to Workable
- Assignee changed from okurz to gpathak
great, it seems all the relevant content is there right now, that's good. https://monitor.qa.suse.de/d/GDstorage/dashboard-for-storage?from=now-30d&to=now&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-65090 shows that we use 79% of /storage now. The alert threshold is 85% so I guess we are good.
To cover AC2 "The backup is reproducible" please share a bit more detail how you did the backup. At best add a section on https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md, link to both https://gitlab.suse.de/suse/wiki/-/blob/main/openqa.md?ref_type=heads#backup and this progress ticket