Project

General

Profile

Actions

action #123933

closed

[worker][ipmi][bmc] Some worker can not be reached via BMC

Added by waynechen55 almost 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-02-06
Due date:
% Done:

0%

Estimated time:

Description

Observation

Some workers can not be reached via BMC:
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.kermit.qa.suse.de -U xxxxx -P xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.gonzo.qa.suse.de -U xxxxx -P xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H sp.scooter.qa.suse.de -U xxxxx -P xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session
waynechen:~ # ipmitool -I lanplus -C 3 -H amd-zen3-gpu-sut1-sp.qa.suse.de -U xxxxx -P xxxxx chassis power status
Error: Unable to establish IPMI v2 / RMCP+ session

So only 3 workers are usable for virtualization jobs.

Steps to reproduce

  • ipmitool -I lanplus xxxxx chassis power status
  • Unreachable BMC returns Error: Unable to establish IPMI v2 / RMCP+ session

Impact

  • Test run with upcoming builds will not finish in a timely manner
  • Failure rate goes up significantly

Problem

BMC down or network glitch ?

Suggestion

  • Check BMC or network connection

Workaround

There is no workaround for this issue. BMC has to be up and reachable


Related issues 1 (0 open1 closed)

Related to QA (public) - action #119551: Move QA labs NUE-2.2.14-B to Frankencampus labs - bare-metal openQA workers size:MResolvednicksinger2023-03-10

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Related to action #119551: Move QA labs NUE-2.2.14-B to Frankencampus labs - bare-metal openQA workers size:M added
Actions #2

Updated by okurz almost 2 years ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Target version set to Ready

I expect the worker is moved to the new lab location and needs be connected to the network yet, see #119551

Actions #3

Updated by waynechen55 almost 2 years ago

okurz wrote:

I expect the worker is moved to the new lab location and needs be connected to the network yet, see #119551

Can this work be done soon ? It light of PublicBeta is approaching, I am considering disable affected workers. Some of them are active in workerconf.sls.

So do you think I need to do this right now or wait for your further feedback ?

Additionally, I expect all machines will keep their current domain name.

Actions #5

Updated by okurz almost 2 years ago

waynechen55 wrote:

okurz wrote:

I expect the worker is moved to the new lab location and needs be connected to the network yet, see #119551

Can this work be done soon ? It light of PublicBeta is approaching, I am considering disable affected workers. Some of them are active in workerconf.sls.

So do you think I need to do this right now or wait for your further feedback ?

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/487 was merged meanwhile. After that we enabled fozzie again with https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/488 which is the one machine that is available again.

We expect the network within QE Basement to be available within the next days up to in the worst case some weeks.

Additionally, I expect all machines will keep their current domain name.

Likely machines within FC Basement will receive a new domain name to make it clear where machines are following a consolidated plan from SUSE-IT Eng-Infra applicable for the complete network at FC (Frankencampus) location.

Actions #6

Updated by okurz almost 2 years ago

  • Subject changed from [worker][ipmi][bmc] Some woker can not be reached via BMC to [worker][ipmi][bmc] Some worker can not be reached via BMC
Actions #7

Updated by okurz almost 2 years ago

@waynechen55 all four machines sp.kermit.qa.suse.de, sp.gonzo.qa.suse.de, sp.scooter.qa.suse.de, amd-zen3-gpu-sut1-sp.qa.suse.de are controllable over IPMI again.
The machines can also boot over PXE but get the PXE boot menu from an Eng-Infra maintained server, not qanet.
I have https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/493 prepared but unadapted tests would fail right now due to the differing PXE environment. In #119551 we are trying to handle the PXE setup with Eng-Infra to have access to a customizable environment but this likely takes more weeks still. In the meantime what is possible and what is an alternative that can be solved completely from os-autoinst-distri-opensuse perspective without needing any changes to infrastructure or backend would be to use the Eng-Infra supplied PXE boot menu and just boot an older version of the SLES installer (either older build or service pack) and conduct a remote installation of the current build from there. If that is not possible due to kernel mismatch between "linux" file and remote repo content then I suggest to boot an older version of SLES and update to the current build. You can consider doing that.

Actions #8

Updated by okurz almost 2 years ago

  • Tags set to infra, ipmi, bmc, FC Basement, lab, PXE
Actions #9

Updated by waynechen55 over 1 year ago

  • Status changed from Blocked to Resolved

BMC connection recovered.

Actions

Also available in: Atom PDF