Project

General

Profile

action #116752

[alert] powerqaworker-qam-1: host up alert

Added by okurz 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-09-19
Due date:
% Done:

0%

Estimated time:
Tags:


Related issues

Related to openQA Infrastructure - action #116722: openqa.suse.de is not reachable 2022-09-18, no ping response, postgreSQL OOM and kernel panics size:MBlocked2022-09-182023-01-20

Copied from openQA Infrastructure - action #116746: [alert] openqaworker9: host up alertResolved2022-09-19

History

#1 Updated by okurz 2 months ago

  • Copied from action #116746: [alert] openqaworker9: host up alert added

#2 Updated by okurz 2 months ago

  • Related to action #116722: openqa.suse.de is not reachable 2022-09-18, no ping response, postgreSQL OOM and kernel panics size:M added

#3 Updated by nicksinger 2 months ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger

#4 Updated by nicksinger 2 months ago

No sol output after connecting to it. Grafana dashboard doesn't show something unusual at the time of failure (2022-09-18, 05:45). Reboot let the machine came up normally. Rebooting 3 times to see if any issue can be reproduced or if we can consider the machine stable.

#5 Updated by nicksinger 2 months ago

After rebooting several times it seems like petitboot sometimes fails to find the OS. "rescan devices" makes them appear and everything works fine afterwards. Will have a look if I can regenerate the grub entries (most likely not, my past research shows that petitboot just "probes" for present kernels).

#6 Updated by nicksinger 2 months ago

  • Status changed from In Progress to Resolved

force reinstalled "kernel-default" because it triggers the right $magic (regenerating initrd, writing grub files, etc). Now the machine was perfectly able to reboot 3 times. I might was too impatient with my previous attempts because petitboot probes for quite some time but eventually finds the disk/os and boots it on its own.
Alert is resumed now.

Also available in: Atom PDF