Project

General

Profile

Actions

action #173347

open

coordination #161414: [epic] Improved salt based infrastructure management

Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:S

Added by okurz 12 days ago. Updated about 6 hours ago.

Status:
Workable
Priority:
Normal
Assignee:
Category:
Feature requests
Start date:
Due date:
2024-12-25 (Due in 14 days)
% Done:

0%

Estimated time:
Tags:

Description

Motivation

During work on #170077 okurz found work on storage for qamaster to be error-prone due to the hardware RAID with many and old storage devices and unusual configuration with RAID6 for root device etc. Before we do more risky stuff we should ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc.

Acceptance criteria

  • AC1: We have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc.
  • AC2: The backup is reproducible and not a complete waste of space (e.g. not copy backup-data.qcow2)

Suggestions


Related issues 3 (3 open0 closed)

Copied from openQA Infrastructure (public) - action #170077: Put more storage into qamaster "to make our lives easier in general" size:MBlockedokurz2024-11-19

Actions
Copied to openQA Infrastructure (public) - action #173350: Migrate VMs from qamaster to modern hypervisor solutionNew2024-11-29

Actions
Copied to openQA Infrastructure (public) - action #173674: qamaster-independent backup size:SBlockeddheidler2024-12-03

Actions
Actions #1

Updated by okurz 12 days ago

  • Copied from action #170077: Put more storage into qamaster "to make our lives easier in general" size:M added
Actions #2

Updated by okurz 12 days ago

  • Copied to action #173350: Migrate VMs from qamaster to modern hypervisor solution added
Actions #3

Updated by okurz 12 days ago

  • Target version changed from future to Ready
Actions #4

Updated by okurz 8 days ago

Actions #5

Updated by mkittler 8 days ago

  • Subject changed from Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. to Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by gpathak 1 day ago

  • Assignee set to gpathak
Actions #7

Updated by gpathak 1 day ago

Jenkins Data: 2.1G
Size of Backup on backup-vm: ~47G
Size of qcow images on qamaster: ~1.1T

25G      /var/lib/libvirt/images/backup_system.qcow2
33G      /var/lib/libvirt/images/baremetal-support.qcow2
7.8M     /var/lib/libvirt/images/calendar-server.qcow2
71G      /var/lib/libvirt/images/jenkins.qcow2
12G      /var/lib/libvirt/images/ntlm-proxy.qcow2
501G     /var/lib/libvirt/images/openqa-monitoring-data.qcow2
80G      /var/lib/libvirt/images/openqa-monitoring.qcow2
200G     /var/lib/libvirt/images/opensuse13.qcow2
15G      /var/lib/libvirt/images/opensuse42.3.qcow2
101G     /var/lib/libvirt/images/win2k19_old.qcow2
50G     /var/lib/libvirt/images/win_server2k19.qcow2
1.1T    total

We need approx. 1.6T of storage for taking a backup.
We have 3.8T of free space on "storage" host.

Actions #8

Updated by gpathak 1 day ago

  • Status changed from Workable to In Progress
Actions #9

Updated by openqa_review about 17 hours ago

  • Due date set to 2024-12-25

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by gpathak about 11 hours ago ยท Edited

Backup is done on storage host at /storage/backup/ under respective directory names

  • backup-vm: Backup of /home/rsnapshot
  • jenkins.qa.suse.de: Backup of Jenkins home folder
  • qamaster: VM Configs, qcow2 images
Actions #11

Updated by gpathak about 6 hours ago

  • Status changed from In Progress to Feedback
  • Assignee changed from gpathak to okurz

Backup is complete.
@okurz I am putting this in Feedback, please verify the backups and let me know if I missed something.

Actions #12

Updated by okurz about 6 hours ago

  • Status changed from Feedback to Workable
  • Assignee changed from okurz to gpathak

great, it seems all the relevant content is there right now, that's good. https://monitor.qa.suse.de/d/GDstorage/dashboard-for-storage?from=now-30d&to=now&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-65090 shows that we use 79% of /storage now. The alert threshold is 85% so I guess we are good.

To cover AC2 "The backup is reproducible" please share a bit more detail how you did the backup. At best add a section on https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md, link to both https://gitlab.suse.de/suse/wiki/-/blob/main/openqa.md?ref_type=heads#backup and this progress ticket

Actions

Also available in: Atom PDF