Project

General

Profile

Actions

action #175716

closed

action #168097: [qe-core] Make openqa.suse.de tests work with mirrors instead of dist.suse.de or download.suse.de

Re-enable IPMI workers size:S

Added by pcervinka about 2 months ago. Updated 28 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Infrastructure
Start date:
2025-01-17
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

IPMI workers were disabled in parent ticket and comment https://progress.opensuse.org/issues/168097#note-29.

@mkittler created https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/941 to avoid running those jobs for now. This commit can be reverted when tests have been somehow adapted. Maybe it makes sense to create a separate ticket for that.

There is probably time to revert the change, because we realized that some machines are working by accident https://openqa.suse.de/tests/16493033#dependencies.
Example shows 64bit-unarmed worker class, which was not masked as other classes. We noticed baremetal tests works again, thanks to this typo/mistake.
There were many changes in the network since workers were disabled, it is hard to guess what made it working.

Rollback Actions

Revert https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/941

Acceptance Criteria

  • AC1: openQA jobs are scheduled on ipmi baremetal workers. Issues observed after the revert should have own follow up tickets.

Related issues 1 (0 open1 closed)

Copied to openQA Tests (public) - action #176931: Machine "monkey3" and "merckx" fail to complete openQA job, ipxe_install already fails - no PXE boot possible? size:SResolvedokurz

Actions
Actions #1

Updated by pcervinka about 2 months ago

  • Description updated (diff)
Actions #2

Updated by okurz about 2 months ago

  • Category set to Infrastructure
  • Target version set to Tools - Next
Actions #4

Updated by gpathak about 1 month ago

  • Subject changed from Re-enable IPMI workers to Re-enable IPMI workers size:S
  • Description updated (diff)
  • Status changed from New to Workable
  • Target version changed from Tools - Next to Ready
Actions #5

Updated by nicksinger about 1 month ago

  • Status changed from Workable to In Progress
  • Assignee set to nicksinger
Actions #6

Updated by openqa_review about 1 month ago

  • Due date set to 2025-02-19

Setting due date based on mean cycle time of SUSE QE Tools

Actions #7

Updated by nicksinger about 1 month ago

I used https://openqa.suse.de/tests/16584465 as a generic 64bit-ipmi base with up to date assets as the most recent jobs ran by nbg workers are quite old already and assets are missing. I started with unreal-3 and a job is currently running at: https://openqa.suse.de/tests/16664392
From what I gathered in Marius' MR, this is a list of hosts I need to verify and enable again:

unreal3-sp.qe.nue2.suse.org
unreal2-sp.qe.nue2.suse.org
unarmed-1.qe.nue2.suse.org
worf-1.qe.nue2.suse.org
merckx-1.qe.nue2.suse.org
gonzo-sp.qe.nue2.suse.org
tails-sp.qe.nue2.suse.org
monkey3-sp.qe.nue2.suse.org
squiddlydiddly-sp.qe.nue2.suse.org
kernel-rt-sp.qe.nue2.suse.org
tyrion-sp.qe.nue2.suse.org
amd-zen3-gpu-sut1-sp.qe.nue2.suse.org
coppi-sp.qe.nue2.suse.org
scooter-sp.qe.nue2.suse.org
kermit-sp.qe.nue2.suse.org
holmes-sp.qe.nue2.suse.org
openqaipmi5-sp.qe.nue2.suse.org
sonic-sp.qe.nue2.suse.org

all of them controlled by sapworker1.

Actions #8

Updated by nicksinger about 1 month ago

I found a lot of masked instances on sapworker1 which should not be necessary as all of them have disabled production classes so for verification I enabled them with:

sapworker1:~ # systemctl unmask openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service
sapworker1:~ # systemctl start openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service
Actions #9

Updated by nicksinger about 1 month ago

nicksinger wrote in #note-8:

I found a lot of masked instances on sapworker1 which should not be necessary as all of them have disabled production classes so for verification I enabled them with:

sapworker1:~ # systemctl unmask openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service
sapworker1:~ # systemctl start openqa-worker-auto-restart@{26,27,30,31,32,38,40,41}.service

kind of true - as Petr wrote in the initial description, unarmed instantly picked up a (production) job again: https://openqa.suse.de/tests/16606264 - I will keep an eye on its status (and disable again if necessary).

Actions #10

Updated by okurz about 1 month ago

careful now. Some machines have been moved to PRG2 and need updated workerconf, see #175947

Actions #11

Updated by nicksinger about 1 month ago

okurz wrote in #note-10:

careful now. Some machines have been moved to PRG2 and need updated workerconf, see #175947

I only caused havoc with unarmed (: All others are properly disabled. Jobs to get rid of:

I merged https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/968 now to have the proper config for them.

Actions #12

Updated by nicksinger about 1 month ago · Edited

Moved IPMI hosts:

unarmed-1.qe.nue2.suse.org
worf-1.qe.nue2.suse.org
tyrion-sp.qe.nue2.suse.org
kernel-rt-sp.qe.nue2.suse.org
amd-zen3-gpu-sut1-sp.qe.nue2.suse.org
squiddlydiddly-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16614406
kermit-sp.qe.nue2.suse.org
gonzo-sp.qe.nue2.suse.org
scooter-sp.qe.nue2.suse.org

-> see #176544 and related tickets

unreal3-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16664392 -> #168097
unreal2-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16664870 -> #168097
openqaipmi5-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726967 -> #168097
tails-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16683322 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
coppi-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16725050 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
sonic-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16683321 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
holmes-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726021 -> https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/972
merckx-1.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726520 -> #176931
monkey3-sp.qe.nue2.suse.org -> https://openqa.suse.de/tests/16726656 -> #176931

Actions #13

Updated by nicksinger about 1 month ago · Edited

I think my used tests cannot even work on these hosts. I asked for according validation tests in https://suse.slack.com/archives/C02CANHLANP/p1739203024354919
Also used the opportunity that 64bit-mlx_con5 was ever provided by sonic and tails and two according jobs have been stuck for 4 days already in the "Test Development: Kernel"-group. So I used the opportunity and temporary changed the workerclass on sapworker1 for these two hosts/slosts and got:

It also helps to look up job references before slots got moved around. Unfortunately also no traces found this way for merckx.
This leaves monkey3, coppi, holmes and openqaipmi5 to validate.

Actions #14

Updated by nicksinger 30 days ago

  • Copied to action #176931: Machine "monkey3" and "merckx" fail to complete openQA job, ipxe_install already fails - no PXE boot possible? size:S added
Actions #15

Updated by nicksinger 29 days ago · Edited

  • Status changed from In Progress to Resolved

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1370 merged. #177078 and #176931 cover all machines which are still disabled.

Actions #16

Updated by okurz 28 days ago

  • Due date deleted (2025-02-19)
Actions

Also available in: Atom PDF