Project

General

Profile

action #116740

[alert] openqaworker14: host up alert

Added by okurz 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-09-19
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de/d/WDopenqaworker14/worker-dashboard-openqaworker14?orgId=1&from=1663513460231&to=1663548197462&viewPanel=65105 shows that openqaworker14 is reported as down since 2022-09-18 2135.

ipmi-openqaworker14-ipmi sol activate reveals:

Give root password for maintenance
(or press Control-D to continue): 

so stuck in bootup

Rollback steps

  • Unpause alert "host up"

Related issues

Related to openQA Infrastructure - action #116722: openqa.suse.de is not reachable 2022-09-18, no ping response, postgreSQL OOM and kernel panics size:MBlocked2022-09-182023-01-20

Copied to openQA Infrastructure - action #116743: [alert] QA-Power8-5-kvm: host up alertResolved2022-09-192022-10-04

History

#1 Updated by okurz 2 months ago

  • Description updated (diff)

#2 Updated by okurz 2 months ago

  • Copied to action #116743: [alert] QA-Power8-5-kvm: host up alert added

#3 Updated by okurz 2 months ago

  • Related to action #116722: openqa.suse.de is not reachable 2022-09-18, no ping response, postgreSQL OOM and kernel panics size:M added

#4 Updated by nicksinger 2 months ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger

#5 Updated by nicksinger 2 months ago

I wasn't able to login into the rescue shell because no password I know of worked. ctrl+D resulted in some OOM-messages of systemd-udev (which is strange). Because I couldn't do anything in the recovery console I just rebooted the machine and it came up again perfectly fine. Changed the root password now to the old default PW (which seems to be used on other workers too).
Rebooting 3x again to see if stable boot can be proven.

#6 Updated by nicksinger 2 months ago

  • Status changed from In Progress to Resolved

Machine successfully rebooted 3 times in a row. "host up" alert is enabled again.

Also available in: Atom PDF