action #134132
closedQA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability
QA - coordination #129280: [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacenters
Bare-metal control openQA worker in NUE2 size:M
0%
Description
Motivation¶
Currently we have multiple openQA OSD bare-metal test machines in NUE2 FC Basement. They are still controlled by openQA workers running from NUE1. To rely less on NUE1 in preparation for further move and also to reduce unnecessary cross-site network transfers the controlling openQA worker should also be in NUE2 FC Basement. This has the additional benefit that if a network outage affects NUE2 then also the according openQA worker affected by the same condition would not try to execute openQA jobs using not available machines.
Acceptance criteria¶
- AC1: All OSD openQA bare-metal machines in NUE2 are controlled by an openQA worker in NUE2
- AC2: All OSD openQA bare-metal machines in NUE2 are still able to run openQA jobs as before
Suggestions¶
- "controlled by" could mean ipmi being used on a worker to execute tests on baremetal
- Identify relevant machines in https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls , e.g. every bare-metal test machine in .qe.nue2.suse.org like sonic https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L1207
- Move those entries to a suitable openQA worker within NUE2, e.g. imagetester, see https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls#L78
- Verify that according openQA jobs still work fine
- Ensure that the worker classes location settings match for the according openQA worker (not the bare-metal target host)
- Inform affected users
- Finish before the currently controlling machines go offline, e.g. grenache-1 as part of #132140
- Consider using multiple controlling hosts to avoid several bare metal workers being down at once
Updated by livdywan about 1 year ago
- Subject changed from Bare-metal control openQA worker in NUE2 to Bare-metal control openQA worker in NUE2 size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan about 1 year ago
- Related to action #134243: fozzie not responsive via ipmi added
Updated by okurz about 1 year ago
- Related to action #132140: Support move of PowerPC machines to PRG2 size:M added
Updated by okurz about 1 year ago
As stated in #132140 grenache will go offline as soon as tomorrow so this ticket should be expedited.
Updated by dheidler about 1 year ago
- Status changed from Workable to In Progress
- Assignee set to dheidler
Updated by dheidler about 1 year ago
- Status changed from In Progress to Feedback
Updated by dheidler about 1 year ago
Moved all worker entries but the ppc64le ones, that are hosted on other ppc partitions on grenache itself.
https://openqa.suse.de/tests/11948434
https://openqa.suse.de/tests/11935627
Updated by okurz about 1 year ago
Lots of incomplete jobs, e.g. see https://openqa.suse.de/tests/overview?result=incomplete&version=15-SP6&distri=sle&arch=ppc64le&build=16.1#
One example: https://openqa.suse.de/tests/11954895
Need variable HMC_HOSTNAME at /usr/lib/os-autoinst/backend/pvm_hmc.pm line 31.
Updated by okurz about 1 year ago
- Related to action #134906: osd-deployment failed due to openqaworker1 showing "No response" in salt size:M added
Updated by dheidler about 1 year ago
okurz wrote in #note-13:
Lots of incomplete jobs, e.g. see https://openqa.suse.de/tests/overview?result=incomplete&version=15-SP6&distri=sle&arch=ppc64le&build=16.1#
One example: https://openqa.suse.de/tests/11954895
Need variable HMC_HOSTNAME at /usr/lib/os-autoinst/backend/pvm_hmc.pm line 31.
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/594
Updated by dheidler about 1 year ago
The ppc64le worker slots are disabled at the moment, as the machines are in move.
The s390 tests don't seem to work at the moment: https://openqa.suse.de/tests/11953313
Updated by okurz about 1 year ago
- Status changed from Feedback to Workable
- Assignee deleted (
dheidler)
Updated by okurz about 1 year ago
- Related to action #134912: Gradually phase out NUE1 based openQA workers size:M added
Updated by okurz about 1 year ago
- Related to action #135137: Bring back imagetester size:M added
Updated by okurz about 1 year ago
- Status changed from Workable to Resolved
Fixed with https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/597 . With that I have moved all NUE2 based bare-metal machines and such to be controlled by openQA workers also local to NUE2 FC Basement, e.g. sapworker1+2+3, spread out over multiple so that problems with an individual machine do not stop all. I have checked multiple instances from openqa.suse.de/admin/workers and all tests look fine. Just one example using HyperV https://openqa.suse.de/tests/12023795