Project

General

Profile

Actions

action #173347

closed

coordination #161414: [epic] Improved salt based infrastructure management

Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:S

Added by okurz 3 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Start date:
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

During work on #170077 okurz found work on storage for qamaster to be error-prone due to the hardware RAID with many and old storage devices and unusual configuration with RAID6 for root device etc. Before we do more risky stuff we should ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc.

Acceptance criteria

  • AC1: We have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc.
  • AC2: The backup is reproducible and not a complete waste of space (e.g. not copy backup-data.qcow2)

Suggestions


Related issues 4 (3 open1 closed)

Related to openQA Infrastructure (public) - action #175791: [alert] storage: partitions usage (%) alert size:SBlockedgpathak

Actions
Copied from openQA Infrastructure (public) - action #170077: Put more storage into qamaster "to make our lives easier in general" size:MResolvedokurz2024-11-19

Actions
Copied to openQA Infrastructure (public) - action #173350: Migrate VMs from qamaster to modern hypervisor solutionNew2024-11-29

Actions
Copied to openQA Infrastructure (public) - action #173674: qamaster-independent backup size:SBlockeddheidler2024-12-03

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #170077: Put more storage into qamaster "to make our lives easier in general" size:M added
Actions #2

Updated by okurz 3 months ago

  • Copied to action #173350: Migrate VMs from qamaster to modern hypervisor solution added
Actions #3

Updated by okurz 3 months ago

  • Target version changed from future to Ready
Actions #4

Updated by okurz 3 months ago

Actions #5

Updated by mkittler 3 months ago

  • Subject changed from Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. to Ensure we have a current backup of qamaster VMs, VM config, jenkins data, data from backup-vm itself, etc. size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by gpathak 3 months ago

  • Assignee set to gpathak
Actions #7

Updated by gpathak 3 months ago

Jenkins Data: 2.1G
Size of Backup on backup-vm: ~47G
Size of qcow images on qamaster: ~1.1T

25G      /var/lib/libvirt/images/backup_system.qcow2
33G      /var/lib/libvirt/images/baremetal-support.qcow2
7.8M     /var/lib/libvirt/images/calendar-server.qcow2
71G      /var/lib/libvirt/images/jenkins.qcow2
12G      /var/lib/libvirt/images/ntlm-proxy.qcow2
501G     /var/lib/libvirt/images/openqa-monitoring-data.qcow2
80G      /var/lib/libvirt/images/openqa-monitoring.qcow2
200G     /var/lib/libvirt/images/opensuse13.qcow2
15G      /var/lib/libvirt/images/opensuse42.3.qcow2
101G     /var/lib/libvirt/images/win2k19_old.qcow2
50G     /var/lib/libvirt/images/win_server2k19.qcow2
1.1T    total

We need approx. 1.6T of storage for taking a backup.
We have 3.8T of free space on "storage" host.

Actions #8

Updated by gpathak 3 months ago

  • Status changed from Workable to In Progress
Actions #9

Updated by openqa_review 3 months ago

  • Due date set to 2024-12-25

Setting due date based on mean cycle time of SUSE QE Tools

Actions #10

Updated by gpathak 3 months ago · Edited

Backup is done on storage host at /storage/backup/ under respective directory names

  • backup-vm: Backup of /home/rsnapshot
  • jenkins.qa.suse.de: Backup of Jenkins home folder
  • qamaster: VM Configs, qcow2 images
Actions #11

Updated by gpathak 3 months ago

  • Status changed from In Progress to Feedback
  • Assignee changed from gpathak to okurz

Backup is complete.
@okurz I am putting this in Feedback, please verify the backups and let me know if I missed something.

Actions #12

Updated by okurz 3 months ago

  • Status changed from Feedback to Workable
  • Assignee changed from okurz to gpathak

great, it seems all the relevant content is there right now, that's good. https://monitor.qa.suse.de/d/GDstorage/dashboard-for-storage?from=now-30d&to=now&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-65090 shows that we use 79% of /storage now. The alert threshold is 85% so I guess we are good.

To cover AC2 "The backup is reproducible" please share a bit more detail how you did the backup. At best add a section on https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md, link to both https://gitlab.suse.de/suse/wiki/-/blob/main/openqa.md?ref_type=heads#backup and this progress ticket

Actions #14

Updated by gpathak 3 months ago

  • Status changed from Workable to In Progress
Actions #15

Updated by gpathak 3 months ago

  • Status changed from In Progress to Resolved

MR merged, closing the ticket.

Actions #16

Updated by okurz about 2 months ago

  • Due date deleted (2024-12-25)
Actions #17

Updated by okurz about 1 month ago

  • Related to action #175791: [alert] storage: partitions usage (%) alert size:S added
Actions

Also available in: Atom PDF