Project

General

Profile

Actions

action #122158

closed

openQA Project (public) - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

action #114565: recover qa-power8-4+qa-power8-5 size:M

[alert] qa-power8-4-kvm host up alert - machine not up, nothing obvious on SoL but IPMI works size:M

Added by okurz about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2022-12-19
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de/d/WDQA-Power8-4-kvm/worker-dashboard-qa-power8-4-kvm?editPanel=65105&tab=alert failed since 2022-12-18 03:00 so the weekly reboot that triggered this, see https://monitor.qa.suse.de/d/WDQA-Power8-4-kvm/worker-dashboard-qa-power8-4-kvm?editPanel=65105&tab=alert&orgId=1&from=1671320482076&to=1671348249406 in detail. The machine is reachable over IPMI but does not show anything obvious.

Suggestions

  • Pause related alert(s)
  • Remove from salt
  • Trigger OSD deployment which failed due to this
  • Trigger reboot
  • Look into the other recent related tickets, e.g. about worker cache sqlite lookup something
  • #114565 and other related

Rollback steps

  • Bring back to salt
  • Unpause alert after resolution
Actions

Also available in: Atom PDF