Project

General

Profile

Actions

action #134132

closed

QA - coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

QA - coordination #129280: [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacenters

Bare-metal control openQA worker in NUE2 size:M

Added by okurz about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Currently we have multiple openQA OSD bare-metal test machines in NUE2 FC Basement. They are still controlled by openQA workers running from NUE1. To rely less on NUE1 in preparation for further move and also to reduce unnecessary cross-site network transfers the controlling openQA worker should also be in NUE2 FC Basement. This has the additional benefit that if a network outage affects NUE2 then also the according openQA worker affected by the same condition would not try to execute openQA jobs using not available machines.

Acceptance criteria

  • AC1: All OSD openQA bare-metal machines in NUE2 are controlled by an openQA worker in NUE2
  • AC2: All OSD openQA bare-metal machines in NUE2 are still able to run openQA jobs as before

Suggestions


Related issues 5 (0 open5 closed)

Related to openQA Infrastructure - action #134243: fozzie not responsive via ipmiRejectedokurz2023-08-15

Actions
Related to QA - action #132140: Support move of PowerPC machines to PRG2 size:MResolvedokurz2023-06-29

Actions
Related to openQA Infrastructure - action #134906: osd-deployment failed due to openqaworker1 showing "No response" in salt size:MResolvednicksinger2023-08-312023-09-23

Actions
Related to openQA Infrastructure - action #134912: Gradually phase out NUE1 based openQA workers size:MResolvedokurz

Actions
Related to openQA Infrastructure - action #135137: Bring back imagetester size:MResolvedokurz2023-09-04

Actions
Actions #2

Updated by okurz about 1 year ago

  • Description updated (diff)
Actions #3

Updated by livdywan about 1 year ago

  • Subject changed from Bare-metal control openQA worker in NUE2 to Bare-metal control openQA worker in NUE2 size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by livdywan about 1 year ago

Actions #5

Updated by okurz about 1 year ago

  • Description updated (diff)
Actions #7

Updated by okurz about 1 year ago

  • Priority changed from High to Urgent
Actions #8

Updated by okurz about 1 year ago

  • Related to action #132140: Support move of PowerPC machines to PRG2 size:M added
Actions #9

Updated by okurz about 1 year ago

As stated in #132140 grenache will go offline as soon as tomorrow so this ticket should be expedited.

Actions #10

Updated by dheidler about 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to dheidler
Actions #11

Updated by dheidler about 1 year ago

  • Status changed from In Progress to Feedback
Actions #12

Updated by dheidler about 1 year ago

Moved all worker entries but the ppc64le ones, that are hosted on other ppc partitions on grenache itself.

https://openqa.suse.de/tests/11948434
https://openqa.suse.de/tests/11935627

Actions #13

Updated by okurz about 1 year ago

Lots of incomplete jobs, e.g. see https://openqa.suse.de/tests/overview?result=incomplete&version=15-SP6&distri=sle&arch=ppc64le&build=16.1#

One example: https://openqa.suse.de/tests/11954895

  Need variable HMC_HOSTNAME at /usr/lib/os-autoinst/backend/pvm_hmc.pm line 31.
Actions #14

Updated by okurz about 1 year ago

  • Related to action #134906: osd-deployment failed due to openqaworker1 showing "No response" in salt size:M added
Actions #15

Updated by dheidler about 1 year ago

Actions #16

Updated by dheidler about 1 year ago

The ppc64le worker slots are disabled at the moment, as the machines are in move.

The s390 tests don't seem to work at the moment: https://openqa.suse.de/tests/11953313

Actions #17

Updated by okurz about 1 year ago

  • Status changed from Feedback to Workable
  • Assignee deleted (dheidler)
Actions #18

Updated by okurz about 1 year ago

  • Assignee set to okurz
Actions #19

Updated by okurz about 1 year ago

  • Related to action #134912: Gradually phase out NUE1 based openQA workers size:M added
Actions #20

Updated by okurz about 1 year ago

Actions #21

Updated by okurz about 1 year ago

  • Status changed from Workable to Resolved

Fixed with https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/597 . With that I have moved all NUE2 based bare-metal machines and such to be controlled by openQA workers also local to NUE2 FC Basement, e.g. sapworker1+2+3, spread out over multiple so that problems with an individual machine do not stop all. I have checked multiple instances from openqa.suse.de/admin/workers and all tests look fine. Just one example using HyperV https://openqa.suse.de/tests/12023795

Actions #22

Updated by okurz about 1 year ago

  • Parent task changed from #130955 to #129280
Actions

Also available in: Atom PDF