Project

General

Profile

Actions

action #157453

closed

[FIRING:1] host_up (qesapworker-prg5: host up alert openQA qesapworker-prg5 host_up_alert_qesapworker-prg5 worker)

Added by dheidler 7 months ago. Updated 7 months ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-03-18
Due date:
% Done:

0%

Estimated time:

Description

https://stats.openqa-monitor.qa.suse.de/alerting/grafana/host_up_alert_qesapworker-prg5/view?orgId=1

The worker seemed to have hung up. No login prompt on serial tty.
Rebooted via IPMI.
Worker came up, but a systemd service failed:

# systemctl status openqa_nvme_format.service
× openqa_nvme_format.service - Setup NVMe before mounting it
     Loaded: loaded (/etc/systemd/system/openqa_nvme_format.service; disabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Mon 2024-03-18 11:01:18 CET; 3min 30s ago
    Process: 31734 ExecStart=/usr/local/bin/openqa-establish-nvme-setup (code=exited, status=1/FAILURE)
   Main PID: 31734 (code=exited, status=1/FAILURE)

Mar 18 11:01:17 qesapworker-prg5 openqa-establish-nvme-setup[31739]: │                                /boot/grub2/i386-pc
Mar 18 11:01:17 qesapworker-prg5 openqa-establish-nvme-setup[31739]: │                                /.snapshots
Mar 18 11:01:17 qesapworker-prg5 openqa-establish-nvme-setup[31739]: │                                /
Mar 18 11:01:17 qesapworker-prg5 openqa-establish-nvme-setup[31739]: └─sda3   8:3    0     1G  0 part [SWAP]
Mar 18 11:01:17 qesapworker-prg5 openqa-establish-nvme-setup[31734]: Creating RAID0 "/dev/md/openqa" on: /dev/disk/by-id/scsi-SDELL_PERC_H755_Adp_00e7176dba09d4532c00f9c13280e04e
Mar 18 11:01:17 qesapworker-prg5 openqa-establish-nvme-setup[31748]: mdadm: cannot open /dev/disk/by-id/scsi-SDELL_PERC_H755_Adp_00e7176dba09d4532c00f9c13280e04e: No such file or directory
Mar 18 11:01:17 qesapworker-prg5 openqa-establish-nvme-setup[31734]: Unable to create RAID, mdadm returned with non-zero code
Mar 18 11:01:18 qesapworker-prg5 systemd[1]: openqa_nvme_format.service: Main process exited, code=exited, status=1/FAILURE
Mar 18 11:01:18 qesapworker-prg5 systemd[1]: openqa_nvme_format.service: Failed with result 'exit-code'.
Mar 18 11:01:18 qesapworker-prg5 systemd[1]: Failed to start Setup NVMe before mounting it.

It seems like the NVMe disk is not found anymore. Maybe it died and the system subsequently freezed.


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #157441: osd-deployment | Failed pipeline for master (qesapworker-prg5.qa.suse.cz)Resolvedokurz2024-03-18

Actions
Actions #1

Updated by dheidler 7 months ago

Related fail in OSD deployment due to worker being unreachable:

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2398227

Actions #2

Updated by okurz 7 months ago

  • Related to action #157441: osd-deployment | Failed pipeline for master (qesapworker-prg5.qa.suse.cz) added
Actions #3

Updated by okurz 7 months ago

  • Category set to Regressions/Crashes
  • Status changed from New to Rejected
  • Assignee set to okurz
  • Target version set to Ready

We had discussed in the daily that tinita would handle the creation of the ticket. Continuing in #157441

Actions

Also available in: Atom PDF