Project

General

Profile

Actions

action #69577

closed

openQA Project - coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results

openQA Project - coordination #80546: [epic] Scale up: Enable to store more results

Handle installation of the new "Storage Server"

Added by nicksinger over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2020-08-04
Due date:
% Done:

0%

Estimated time:

Description

We received our new Storage Server which we want to connect to openQA. It got delivered to SUSE with attention to Ralf Unger and is located in the Nuremberg post-office. It has 2x 10GBit/s RJ45 ports which need an according uplink. As this machine will communicate with openQA (a VM on the OBS cluster) it might make sense to bring it close to this cluster (wherever this is located). If 10G does not work out at all we could start with 1G for now.

Please make sure to somehow obtain a copy of the invoice which is normally taped on the outside of the parcel and send it to @mgriessmeier so he can release the pay for our server. already done

So the task would be to open an infra ticket and ask them nicely to put the machine into the server room. You might also need to discuss how to connect the 10Gbit/s port since infra has no 10G hardware (AFAIK).

At the very least the machine needs to be moved away from the post office :)


Related issues 6 (0 open6 closed)

Related to openQA Infrastructure - action #44078: Implement proper backups for o3 size:MResolvedmkittler2018-11-20

Actions
Related to openQA Infrastructure - action #66709: Storage server for OSD and monitoringResolvednicksinger2020-05-12

Actions
Related to openQA Infrastructure - action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by saltResolvednicksinger2021-06-09

Actions
Related to openQA Infrastructure - action #121282: Recover storage.qa.suse.de size:SResolvednicksinger2022-12-01

Actions
Copied to openQA Infrastructure - action #88546: Make use of the new "Storage Server", e.g. complete OSD backupResolvedokurz

Actions
Copied to openQA Infrastructure - action #90629: administration of the new "Storage Server"Resolvednicksinger2020-08-04

Actions
Actions #1

Updated by nicksinger over 3 years ago

  • Description updated (diff)
Actions #2

Updated by nicksinger over 3 years ago

  • Description updated (diff)
Actions #3

Updated by nicksinger over 3 years ago

  • Description updated (diff)
Actions #4

Updated by nicksinger over 3 years ago

  • Description updated (diff)
Actions #5

Updated by nicksinger over 3 years ago

  • Assignee set to okurz
  • Priority changed from Normal to Urgent

@okurz: I assign this for you for now because I have full trust that you find the right person and maybe some volunteer. The urgency reflects only that it needs to move out of the post-office - my office, the labs or a proper rack inside the server room would all work out to reduce the prio to "low" ;)

Actions #6

Updated by okurz over 3 years ago

  • Due date set to 2020-08-11
  • Status changed from Workable to Blocked
Actions #7

Updated by okurz over 3 years ago

  • Due date changed from 2020-08-11 to 2020-08-26
  • Assignee changed from okurz to nicksinger
  • Priority changed from Urgent to Normal
  • Target version set to Ready

we have the ticket, it was assigned but has seen no update, not sure if there has been actual action. nsinger to check later.

Actions #8

Updated by nicksinger over 3 years ago

  • Due date changed from 2020-08-26 to 2020-09-04

It got escalated to Ralf when I came back from vacation. Now Gerhard mentioned that the team is back to full capacity next week and supposedly scheduled it to do it then. Therefore I raise the "due date".

Actions #9

Updated by okurz over 3 years ago

  • Status changed from Blocked to In Progress

as you mentioned the machine is mounted in the rack and accessible now. Please see about the task in https://infra.nue.suse.com/SelfService/Display.html?id=175645 as well as you are on it already.

Actions #10

Updated by nicksinger over 3 years ago

  • Status changed from In Progress to Workable

Keeping me assigned but setting the ticket to "Workable" as I'm currently not working on it. Whoever wants to give it a try can simply unassign me

Actions #12

Updated by okurz over 3 years ago

  • Related to action #44078: Implement proper backups for o3 size:M added
Actions #14

Updated by livdywan over 3 years ago

  • Due date changed from 2020-09-04 to 2020-11-30

See #76972 for the request for additional resources.

Actions #15

Updated by okurz over 3 years ago

  • Target version changed from Ready to future
Actions #16

Updated by nicksinger over 3 years ago

  • Status changed from Workable to In Progress

The OS is installed now and reachable over ssh with its IP 10.160.66.189

We still need to decide on how to setup the storage. Mainly raid level and technology (mdadm raid, btrfs raid, FS).

Actions #17

Updated by livdywan over 3 years ago

nicksinger wrote:

The OS is installed now and reachable over ssh with its IP 10.160.66.189

We still need to decide on how to setup the storage. Mainly raid level and technology (mdadm raid, btrfs raid, FS).

Maybe this makes sense to discuss in the Weekly? With perhaps the related point raised by @mkittler in os-autoinst/openQA#3635(poo#88121)

Actions #18

Updated by okurz over 3 years ago

  • Related to action #66709: Storage server for OSD and monitoring added
Actions #19

Updated by okurz over 3 years ago

  • Parent task set to #80546
Actions #20

Updated by okurz over 3 years ago

  • Target version changed from future to Ready
Actions #21

Updated by nicksinger over 3 years ago

  • Target version changed from Ready to future

Discussed in the weekly:

  • btrfs fs/raid
  • nfs4 export
  • hostname: storage
  • basic salt integration (e.g. ssh)
Actions #22

Updated by nicksinger over 3 years ago

  • Target version changed from future to Ready
Actions #23

Updated by livdywan about 3 years ago

  • Due date changed from 2020-11-30 to 2021-02-12
Actions #24

Updated by livdywan about 3 years ago

Since we were talking about the server in the daily I did a bit of smoke testing. I can ssh storage.qa.suse.de as my user, sudo works.

Just noticed one oddity - I can use sudo for just wondering if that might point to some other configuration issue:

$ dmesg
dmesg: read kernel buffer failed: Operation not permitted

@nicksinger is enabling the NFS share next

Actions #25

Updated by nicksinger about 3 years ago

cdywan wrote:

Since we were talking about the server in the daily I did a bit of smoke testing. I can ssh storage.qa.suse.de as my user, sudo works.

Just noticed one oddity - I can use sudo for just wondering if that might point to some other configuration issue:

$ dmesg
dmesg: read kernel buffer failed: Operation not permitted

@nicksinger is enabling the NFS share next

Seems like there was a setting introduced in newer kernel named "kernel.dmesg_restrict". This is enabled on the storage server while it is disabled on all our workers. I assume this is mainly caused by a newer installation on there. I wouldn't bother changing it as we have root access anyway :)

Actions #26

Updated by nicksinger about 3 years ago

  • Status changed from In Progress to Resolved

an nfs share named /storage is now enabled. I did a quick test from OSD where I was successfully able to mount it and write to that share. You can try it yourself by running mount -t nfs4 storage.qa.suse.de:/storage /storage.

Actions #27

Updated by okurz about 3 years ago

  • Copied to action #88546: Make use of the new "Storage Server", e.g. complete OSD backup added
Actions #28

Updated by okurz about 3 years ago

  • Due date deleted (2021-02-12)
Actions #29

Updated by okurz about 3 years ago

  • Copied to action #90629: administration of the new "Storage Server" added
Actions #30

Updated by okurz almost 3 years ago

  • Related to action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by salt added
Actions #31

Updated by nicksinger over 1 year ago

Actions

Also available in: Atom PDF