Project

General

Profile

action #69577

openQA Project - coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results

openQA Project - coordination #80546: [epic] Scale up: Enable to store more results

Handle installation of the new "Storage Server"

Added by nicksinger about 1 year ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2020-08-04
Due date:
% Done:

0%

Estimated time:

Description

We received our new Storage Server which we want to connect to openQA. It got delivered to SUSE with attention to Ralf Unger and is located in the Nuremberg post-office. It has 2x 10GBit/s RJ45 ports which need an according uplink. As this machine will communicate with openQA (a VM on the OBS cluster) it might make sense to bring it close to this cluster (wherever this is located). If 10G does not work out at all we could start with 1G for now.

Please make sure to somehow obtain a copy of the invoice which is normally taped on the outside of the parcel and send it to mgriessmeier so he can release the pay for our server. already done

So the task would be to open an infra ticket and ask them nicely to put the machine into the server room. You might also need to discuss how to connect the 10Gbit/s port since infra has no 10G hardware (AFAIK).

At the very least the machine needs to be moved away from the post office :)


Related issues

Related to openQA Infrastructure - action #44078: Implement proper backups for o3 size:MWorkable2018-11-20

Related to openQA Infrastructure - action #66709: Storage server for OSD and monitoringResolved2020-05-12

Related to openQA Infrastructure - action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by saltResolved2021-06-09

Copied to openQA Infrastructure - action #88546: Make use of the new "Storage Server", e.g. complete OSD backupBlocked

Copied to openQA Infrastructure - action #90629: administration of the new "Storage Server"Resolved2020-08-04

History

#1 Updated by nicksinger about 1 year ago

  • Description updated (diff)

#2 Updated by nicksinger about 1 year ago

  • Description updated (diff)

#3 Updated by nicksinger about 1 year ago

  • Description updated (diff)

#4 Updated by nicksinger about 1 year ago

  • Description updated (diff)

#5 Updated by nicksinger about 1 year ago

  • Assignee set to okurz
  • Priority changed from Normal to Urgent

okurz: I assign this for you for now because I have full trust that you find the right person and maybe some volunteer. The urgency reflects only that it needs to move out of the post-office - my office, the labs or a proper rack inside the server room would all work out to reduce the prio to "low" ;)

#6 Updated by okurz about 1 year ago

  • Due date set to 2020-08-11
  • Status changed from Workable to Blocked

#7 Updated by okurz about 1 year ago

  • Due date changed from 2020-08-11 to 2020-08-26
  • Assignee changed from okurz to nicksinger
  • Priority changed from Urgent to Normal
  • Target version set to Ready

we have the ticket, it was assigned but has seen no update, not sure if there has been actual action. nsinger to check later.

#8 Updated by nicksinger about 1 year ago

  • Due date changed from 2020-08-26 to 2020-09-04

It got escalated to Ralf when I came back from vacation. Now Gerhard mentioned that the team is back to full capacity next week and supposedly scheduled it to do it then. Therefore I raise the "due date".

#9 Updated by okurz about 1 year ago

  • Status changed from Blocked to In Progress

as you mentioned the machine is mounted in the rack and accessible now. Please see about the task in https://infra.nue.suse.com/SelfService/Display.html?id=175645 as well as you are on it already.

#10 Updated by nicksinger about 1 year ago

  • Status changed from In Progress to Workable

Keeping me assigned but setting the ticket to "Workable" as I'm currently not working on it. Whoever wants to give it a try can simply unassign me

#12 Updated by okurz 12 months ago

  • Related to action #44078: Implement proper backups for o3 size:M added

#14 Updated by cdywan 11 months ago

  • Due date changed from 2020-09-04 to 2020-11-30

See #76972 for the request for additional resources.

#15 Updated by okurz 11 months ago

  • Target version changed from Ready to future

#16 Updated by nicksinger 9 months ago

  • Status changed from Workable to In Progress

The OS is installed now and reachable over ssh with its IP 10.160.66.189

We still need to decide on how to setup the storage. Mainly raid level and technology (mdadm raid, btrfs raid, FS).

#17 Updated by cdywan 9 months ago

nicksinger wrote:

The OS is installed now and reachable over ssh with its IP 10.160.66.189

We still need to decide on how to setup the storage. Mainly raid level and technology (mdadm raid, btrfs raid, FS).

Maybe this makes sense to discuss in the Weekly? With perhaps the related point raised by mkittler in os-autoinst/openQA#3635(poo#88121)

#18 Updated by okurz 9 months ago

  • Related to action #66709: Storage server for OSD and monitoring added

#19 Updated by okurz 9 months ago

  • Parent task set to #80546

#20 Updated by okurz 9 months ago

  • Target version changed from future to Ready

#21 Updated by nicksinger 9 months ago

  • Target version changed from Ready to future

Discussed in the weekly:

  • btrfs fs/raid
  • nfs4 export
  • hostname: storage
  • basic salt integration (e.g. ssh)

#22 Updated by nicksinger 9 months ago

  • Target version changed from future to Ready

#23 Updated by cdywan 8 months ago

  • Due date changed from 2020-11-30 to 2021-02-12

#24 Updated by cdywan 8 months ago

Since we were talking about the server in the daily I did a bit of smoke testing. I can ssh storage.qa.suse.de as my user, sudo works.

Just noticed one oddity - I can use sudo for just wondering if that might point to some other configuration issue:

$ dmesg
dmesg: read kernel buffer failed: Operation not permitted

nicksinger is enabling the NFS share next

#25 Updated by nicksinger 8 months ago

cdywan wrote:

Since we were talking about the server in the daily I did a bit of smoke testing. I can ssh storage.qa.suse.de as my user, sudo works.

Just noticed one oddity - I can use sudo for just wondering if that might point to some other configuration issue:

$ dmesg
dmesg: read kernel buffer failed: Operation not permitted

nicksinger is enabling the NFS share next

Seems like there was a setting introduced in newer kernel named "kernel.dmesg_restrict". This is enabled on the storage server while it is disabled on all our workers. I assume this is mainly caused by a newer installation on there. I wouldn't bother changing it as we have root access anyway :)

#26 Updated by nicksinger 8 months ago

  • Status changed from In Progress to Resolved

an nfs share named /storage is now enabled. I did a quick test from OSD where I was successfully able to mount it and write to that share. You can try it yourself by running mount -t nfs4 storage.qa.suse.de:/storage /storage.

#27 Updated by okurz 8 months ago

  • Copied to action #88546: Make use of the new "Storage Server", e.g. complete OSD backup added

#28 Updated by okurz 7 months ago

  • Due date deleted (2021-02-12)

#29 Updated by okurz 7 months ago

  • Copied to action #90629: administration of the new "Storage Server" added

#30 Updated by okurz 4 months ago

  • Related to action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by salt added

Also available in: Atom PDF