Project

General

Profile

Actions

action #128654

closed

[sporadic] Fail to create an ipmi session to worker grenache-1:16 (ix64ph1075) in its vlan

Added by Julie_CAO over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-05-04
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

All tests running on grenache-1:16 failed today. It worked 5 days ago.

https://openqa.suse.de/admin/workers/1247
Failure looks like:

[2023-05-04T07:49:10.563049+02:00] [info] [pid:113776] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  ipmitool -I lanplus -H ix64hm1200.qa.suse.de -U admin -P [masked] mc guid: IPMI response is NULL. at /usr/lib/os-autoinst/backend/ipmi.pm line 45.
[2023-05-04T07:49:23.682702+02:00] [warn] [pid:113776] !!! backend::baseclass::run_capture_loop: capture loop failed ipmitool -I lanplus -H ix64hm1200.qa.suse.de -U admin -P [masked] chassis power off: Error: Unable to establish IPMI v2 / RMCP+ session at /usr/lib/os-autoinst/backend/ipmi.pm line 45.

I tried "ipmitool -I lanplus -H 10.162.28.200 -U admin -P [masked] ..." in other VLAN, such as on 10.168.192.87 and my laptop with vpn, it did not report failures. However if I run on 10.162.2.99, errors prompted:

fozzie-1:~ # ipmitool -I lanplus -H 10.162.28.200 -U admin -P [masked] -vvv
ipmitool version 1.8.18
...
Get Auth Capabilities error
Error issuing Get Channel Authentication Capabilities request
Error: Unable to establish IPMI v2 / RMCP+ session

Anything wrong with IPMI services or configurations in 10.162.xx vlan or this machine?

Problem

  • H1 The network in NUE1-SRV2-B rack 1+2 is badly impacted due to switch behaviour -> E1-1 Reset switches in rack 1 and 2 and rerun experiment from #128654#note-7
  • H2 IPMI is just unstable in general and needs retries and waits -> E2-1 Increase number of retries (default 4) and timeout (default 1 for lanplas), e.g. -R 10 -N 10
  • H3 ppc64le ipmitool behaves different -> E3-1 Crosscheck experiment on different machines and architectures

Related issues 3 (2 open1 closed)

Related to openQA Project (public) - action #106056: [virtualization][tools] Improve retry behaviour and connection error handling in backend::ipmi (was: "Fail to connect openqaipmi5-sp.qa.suse.de on our osd environment") size:MWorkable2022-02-07

Actions
Related to openQA Tests (public) - action #128501: [security] [QR] [IPMI] test fails in consoletest_setup on @64bit-ipmiResolvedemiler2023-05-02

Actions
Copied to openQA Infrastructure (public) - action #129032: ipmitool monitoringNew

Actions
Actions

Also available in: Atom PDF