Project

General

Profile

Actions

action #88546

closed

openQA Project - coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results

openQA Project - coordination #80546: [epic] Scale up: Enable to store more results

Make use of the new "Storage Server", e.g. complete OSD backup

Added by okurz about 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Acceptance criteria

  • AC1: The SUSE QA storage server is used within our production and we (the team) know what is used for

Suggestions

  • Ask nsinger where to connect to, what steps to start with? The hostname is storage.qa.suse.de
  • Try to connect to storage.qa with the same credentials as for osd machines maintained with salt, e.g. just ssh with your user should work SSH access should work at this point
  • Add changes and make sure the changes are in salt
    • When populating the btrfs filesystem storage.qa.suse.de:/storage it would make sense to create dedicated subvolumes for different things
      • e.g. do a full or partial backup of OSD
      • e.g. mount storage.qa.suse.de:/storage on OSD and configure the archiving feature to use it
  • What is included in a complete OSD backup?: To be answered by #96269
  • Also include postgres? okurz: No, to be covered by #94015
  • Which backups solution to use, e.g. rsnapshot?: okurz: Yes, use rsnapshot, same as we currently do on backup.qa.suse.de already

Further details

If we try to conduct a "complete OSD backup" by this we can also learn the performance impact, e.g. how long does it initially take to synchronize, how long does it take to do individual, e.g. daily syncs


Related issues 6 (0 open6 closed)

Related to openQA Infrastructure - action #92701: backup of etc/ from both o3 was not working since some days due to OOM on backup.qa.suse.de (was: … and osd not updated anymore since 2019)Resolvedmkittler2021-05-142021-06-30

Actions
Related to openQA Infrastructure - action #44078: Implement proper backups for o3 size:MResolvedmkittler2018-11-20

Actions
Related to openQA Infrastructure - action #121282: Recover storage.qa.suse.de size:SResolvednicksinger2022-12-01

Actions
Blocks openQA Project - action #92788: Use openQA archiving feature on osd size:SResolvedokurz

Actions
Copied from openQA Infrastructure - action #69577: Handle installation of the new "Storage Server"Resolvednicksinger2020-08-04

Actions
Copied to openQA Infrastructure - action #96269: Define what a "complete OSD backup" should or can includeResolvedokurz2021-07-29

Actions
Actions #1

Updated by okurz about 3 years ago

  • Copied from action #69577: Handle installation of the new "Storage Server" added
Actions #2

Updated by livdywan about 3 years ago

Could we add some suggestions here and make it Workable? Like where to connect to, what steps to start with?

Actions #3

Updated by okurz about 3 years ago

  • Description updated (diff)
  • Status changed from New to Workable

Yes, we should. I can't do much on that on my own though. nsinger knows more

Actions #4

Updated by okurz about 3 years ago

  • Description updated (diff)
Actions #5

Updated by okurz about 3 years ago

  • Description updated (diff)
Actions #6

Updated by okurz about 3 years ago

  • Description updated (diff)
Actions #7

Updated by mkittler almost 3 years ago

Now with the archiving feature enabled one could try to mount storage.qa.suse.de:/storage on OSD and configure the archiving feature to use it.

Actions #8

Updated by mkittler almost 3 years ago

  • Related to action #92701: backup of etc/ from both o3 was not working since some days due to OOM on backup.qa.suse.de (was: … and osd not updated anymore since 2019) added
Actions #9

Updated by mkittler almost 3 years ago

  • Description updated (diff)

I've been updating the ticket description:

  • There's an overlap between this ticket and #92701. I suppose if we opt for the full backup of OSD here we wouldn't need #92701 anymore. It also leads to the idea of only backing up /etc (and maybe some other important directories) first.
  • It looks like /storage on storage.qa.suse.de is using btrfs. That makes sense and I suppose if we populate it with various things, e.g. an archive or backups we should create an own subvolume for these.
Actions #10

Updated by mkittler almost 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler

Now since we have the archiving feature enabling it is likely the easiest use of the storage server so I'll start with that.

Actions #11

Updated by openqa_review almost 3 years ago

  • Due date set to 2021-06-16

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by okurz almost 3 years ago

before we accept the MR we should do #91779 first. Also see the problem of this morning about storage.qa: #93683

Actions #14

Updated by okurz almost 3 years ago

We need to rethink. With https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/501 storage.qa.suse.de would become a critical component. So far what we have is a central VM that is backed by multiple levels of high-grade redundancy which we can see confirmed in years of flawless availability. And all the workers combined also provide built-in redundancy with our scheduling algorithm. The physical machine storage.qa.suse.de shows the single point of failure as visible in #93683 . I still see using storage.qa.suse.de as backup target as a good first approach. And I recommend to do that first. For the archiving feature we can still make use of that immediately as on osd we have expensive+fast and cheap+slow storage to be used

EDIT: Because we talked about if rsnapshot supports btrfs snapshots. The gentoo wiki mentions http://web.archive.org/web/20190910001551/http://it.werther-web.de/2011/10/23/migrate-rsnapshot-based-backup-to-btrfs-snapshots/ (the original page yields 404)

Actions #15

Updated by mkittler almost 3 years ago

  • Status changed from In Progress to Workable
Actions #16

Updated by mkittler almost 3 years ago

  • Assignee deleted (mkittler)

I haven't progressed here since we decided to focus on #92701 first. I'm unassigning because I won't be able to work on this until next Tuesday.

Actions #17

Updated by okurz almost 3 years ago

  • Due date deleted (2021-06-16)
Actions #18

Updated by okurz almost 3 years ago

  • Status changed from Workable to New

moving all tickets without size confirmation by the team back to "New". The team should move the tickets back after estimating and agreeing on a consistent size

Actions #19

Updated by okurz almost 3 years ago

  • Blocks action #92788: Use openQA archiving feature on osd size:S added
Actions #20

Updated by ilausuch over 2 years ago

  • Description updated (diff)

We need to answer the two last questions in suggestions section before do it workable

Actions #21

Updated by okurz over 2 years ago

  • Copied to action #96269: Define what a "complete OSD backup" should or can include added
Actions #22

Updated by okurz over 2 years ago

  • Description updated (diff)
Actions #23

Updated by okurz over 2 years ago

  • Related to action #44078: Implement proper backups for o3 size:M added
Actions #24

Updated by okurz over 2 years ago

  • Status changed from New to Blocked
  • Assignee set to okurz

#44078 first

Actions #25

Updated by okurz over 2 years ago

  • Status changed from Blocked to Resolved

With #44078 completed we make active use of the storage space on storage.qa.suse.de and also that host is fully controlled with salt and actively monitored. Team agreed that we have AC1 covered :)

Actions #26

Updated by okurz over 1 year ago

Actions

Also available in: Atom PDF