action #175716
closedaction #168097: [qe-core] Make openqa.suse.de tests work with mirrors instead of dist.suse.de or download.suse.de
Re-enable IPMI workers size:S
0%
Description
Motivation¶
IPMI workers were disabled in parent ticket and comment https://progress.opensuse.org/issues/168097#note-29.
@mkittler created https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/941 to avoid running those jobs for now. This commit can be reverted when tests have been somehow adapted. Maybe it makes sense to create a separate ticket for that.
There is probably time to revert the change, because we realized that some machines are working by accident https://openqa.suse.de/tests/16493033#dependencies.
Example shows 64bit-unarmed
worker class, which was not masked as other classes. We noticed baremetal tests works again, thanks to this typo/mistake.
There were many changes in the network since workers were disabled, it is hard to guess what made it working.
Rollback Actions¶
Revert https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/941
Acceptance Criteria¶
- AC1: openQA jobs are scheduled on ipmi baremetal workers. Issues observed after the revert should have own follow up tickets.
Updated by okurz about 2 months ago
- Category set to Infrastructure
- Target version set to Tools - Next
Updated by gpathak about 1 month ago
- Subject changed from Re-enable IPMI workers to Re-enable IPMI workers size:S
- Description updated (diff)
- Status changed from New to Workable
- Target version changed from Tools - Next to Ready
Updated by nicksinger about 1 month ago
- Status changed from Workable to In Progress
- Assignee set to nicksinger
Updated by openqa_review about 1 month ago
- Due date set to 2025-02-19
Setting due date based on mean cycle time of SUSE QE Tools
Updated by nicksinger about 1 month ago
I used https://openqa.suse.de/tests/16584465 as a generic 64bit-ipmi base with up to date assets as the most recent jobs ran by nbg workers are quite old already and assets are missing. I started with unreal-3 and a job is currently running at: https://openqa.suse.de/tests/16664392
From what I gathered in Marius' MR, this is a list of hosts I need to verify and enable again:
unreal3-sp.qe.nue2.suse.org
unreal2-sp.qe.nue2.suse.org
unarmed-1.qe.nue2.suse.org
worf-1.qe.nue2.suse.org
merckx-1.qe.nue2.suse.org
gonzo-sp.qe.nue2.suse.org
tails-sp.qe.nue2.suse.org
monkey3-sp.qe.nue2.suse.org
squiddlydiddly-sp.qe.nue2.suse.org
kernel-rt-sp.qe.nue2.suse.org
tyrion-sp.qe.nue2.suse.org
amd-zen3-gpu-sut1-sp.qe.nue2.suse.org
coppi-sp.qe.nue2.suse.org
scooter-sp.qe.nue2.suse.org
kermit-sp.qe.nue2.suse.org
holmes-sp.qe.nue2.suse.org
openqaipmi5-sp.qe.nue2.suse.org
sonic-sp.qe.nue2.suse.org
all of them controlled by sapworker1.
Updated by nicksinger about 1 month ago
I found a lot of masked instances on sapworker1 which should not be necessary as all of them have disabled production classes so for verification I enabled them with:
sapworker1:~ # systemctl unmask openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service
sapworker1:~ # systemctl start openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service
Updated by nicksinger about 1 month ago
nicksinger wrote in #note-8:
I found a lot of masked instances on sapworker1 which should not be necessary as all of them have disabled production classes so for verification I enabled them with:
sapworker1:~ # systemctl unmask openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service sapworker1:~ # systemctl start openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service
kind of true - as Petr wrote in the initial description, unarmed instantly picked up a (production) job again: https://openqa.suse.de/tests/16606264 - I will keep an eye on its status (and disable again if necessary).
Updated by okurz about 1 month ago
careful now. Some machines have been moved to PRG2 and need updated workerconf, see #175947
Updated by nicksinger about 1 month ago
okurz wrote in #note-10:
careful now. Some machines have been moved to PRG2 and need updated workerconf, see #175947
I only caused havoc with unarmed (: All others are properly disabled. Jobs to get rid of:
- https://openqa.suse.de/tests/16606281
- https://openqa.suse.de/tests/16606264
- https://openqa.suse.de/tests/16606281
I merged https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/968 now to have the proper config for them.
Updated by nicksinger about 1 month ago · Edited
Moved IPMI hosts:
unarmed-1.qe.nue2.suse.org
worf-1.qe.nue2.suse.org
tyrion-sp.qe.nue2.suse.org
kernel-rt-sp.qe.nue2.suse.org
amd-zen3-gpu-sut1-sp.qe.nue2.suse.org
squiddlydiddly-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16614406
kermit-sp.qe.nue2.suse.org
gonzo-sp.qe.nue2.suse.org
scooter-sp.qe.nue2.suse.org
-> see #176544 and related tickets
unreal3-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16664392 -> #168097
unreal2-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16664870 -> #168097
openqaipmi5-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726967 -> #168097
tails-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16683322 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
coppi-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16725050 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
sonic-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16683321 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
holmes-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726021 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
merckx-1.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726520 -> #176931
monkey3-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726656 -> #176931
Updated by nicksinger about 1 month ago · Edited
I think my used tests cannot even work on these hosts. I asked for according validation tests in https://suse.slack.com/archives/C02CANHLANP/p1739203024354919
Also used the opportunity that 64bit-mlx_con5
was ever provided by sonic and tails and two according jobs have been stuck for 4 days already in the "Test Development: Kernel"-group. So I used the opportunity and temporary changed the workerclass on sapworker1 for these two hosts/slosts and got:
It also helps to look up job references before slots got moved around. Unfortunately also no traces found this way for merckx.
This leaves monkey3, coppi, holmes and openqaipmi5 to validate.
Updated by nicksinger 30 days ago
- Copied to action #176931: Machine "monkey3" and "merckx" fail to complete openQA job, ipxe_install already fails - no PXE boot possible? size:S added
Updated by nicksinger 29 days ago · Edited
- Status changed from In Progress to Resolved
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1370 merged. #177078 and #176931 cover all machines which are still disabled.