action #69577
closed
openQA Project - coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results
openQA Project - coordination #80546: [epic] Scale up: Enable to store more results
Handle installation of the new "Storage Server"
Added by nicksinger about 4 years ago.
Updated over 3 years ago.
Description
We received our new Storage Server which we want to connect to openQA. It got delivered to SUSE with attention to Ralf Unger and is located in the Nuremberg post-office. It has 2x 10GBit/s RJ45 ports which need an according uplink. As this machine will communicate with openQA (a VM on the OBS cluster) it might make sense to bring it close to this cluster (wherever this is located). If 10G does not work out at all we could start with 1G for now.
Please make sure to somehow obtain a copy of the invoice which is normally taped on the outside of the parcel and send it to @mgriessmeier so he can release the pay for our server. already done
So the task would be to open an infra ticket and ask them nicely to put the machine into the server room. You might also need to discuss how to connect the 10Gbit/s port since infra has no 10G hardware (AFAIK).
At the very least the machine needs to be moved away from the post office :)
- Description updated (diff)
- Description updated (diff)
- Description updated (diff)
- Description updated (diff)
- Assignee set to okurz
- Priority changed from Normal to Urgent
@okurz: I assign this for you for now because I have full trust that you find the right person and maybe some volunteer. The urgency reflects only that it needs to move out of the post-office - my office, the labs or a proper rack inside the server room would all work out to reduce the prio to "low" ;)
- Due date set to 2020-08-11
- Status changed from Workable to Blocked
- Due date changed from 2020-08-11 to 2020-08-26
- Assignee changed from okurz to nicksinger
- Priority changed from Urgent to Normal
- Target version set to Ready
we have the ticket, it was assigned but has seen no update, not sure if there has been actual action. nsinger to check later.
- Due date changed from 2020-08-26 to 2020-09-04
It got escalated to Ralf when I came back from vacation. Now Gerhard mentioned that the team is back to full capacity next week and supposedly scheduled it to do it then. Therefore I raise the "due date".
- Status changed from Blocked to In Progress
- Status changed from In Progress to Workable
Keeping me assigned but setting the ticket to "Workable" as I'm currently not working on it. Whoever wants to give it a try can simply unassign me
- Related to action #44078: Implement proper backups for o3 size:M added
- Due date changed from 2020-09-04 to 2020-11-30
See #76972 for the request for additional resources.
- Target version changed from Ready to future
- Status changed from Workable to In Progress
The OS is installed now and reachable over ssh with its IP 10.160.66.189
We still need to decide on how to setup the storage. Mainly raid level and technology (mdadm raid, btrfs raid, FS).
nicksinger wrote:
The OS is installed now and reachable over ssh with its IP 10.160.66.189
We still need to decide on how to setup the storage. Mainly raid level and technology (mdadm raid, btrfs raid, FS).
Maybe this makes sense to discuss in the Weekly? With perhaps the related point raised by @mkittler in os-autoinst/openQA#3635(poo#88121)
- Related to action #66709: Storage server for OSD and monitoring added
- Parent task set to #80546
- Target version changed from future to Ready
- Target version changed from Ready to future
Discussed in the weekly:
- btrfs fs/raid
- nfs4 export
- hostname: storage
- basic salt integration (e.g. ssh)
- Target version changed from future to Ready
- Due date changed from 2020-11-30 to 2021-02-12
Since we were talking about the server in the daily I did a bit of smoke testing. I can ssh storage.qa.suse.de
as my user, sudo
works.
Just noticed one oddity - I can use sudo
for just wondering if that might point to some other configuration issue:
$ dmesg
dmesg: read kernel buffer failed: Operation not permitted
@nicksinger is enabling the NFS share next
cdywan wrote:
Since we were talking about the server in the daily I did a bit of smoke testing. I can ssh storage.qa.suse.de
as my user, sudo
works.
Just noticed one oddity - I can use sudo
for just wondering if that might point to some other configuration issue:
$ dmesg
dmesg: read kernel buffer failed: Operation not permitted
@nicksinger is enabling the NFS share next
Seems like there was a setting introduced in newer kernel named "kernel.dmesg_restrict". This is enabled on the storage server while it is disabled on all our workers. I assume this is mainly caused by a newer installation on there. I wouldn't bother changing it as we have root access anyway :)
- Status changed from In Progress to Resolved
an nfs share named /storage
is now enabled. I did a quick test from OSD where I was successfully able to mount it and write to that share. You can try it yourself by running mount -t nfs4 storage.qa.suse.de:/storage /storage
.
- Copied to action #88546: Make use of the new "Storage Server", e.g. complete OSD backup added
- Due date deleted (
2021-02-12)
- Copied to action #90629: administration of the new "Storage Server" added
- Related to action #93683: osd-deployment failed due to storage.qa.suse.de not reachable by salt added
Also available in: Atom
PDF