Project

General

Profile

Actions

action #157441

closed

osd-deployment | Failed pipeline for master (qesapworker-prg5.qa.suse.cz)

Added by tinita 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Start date:
2024-03-18
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2398227

Date: Sun, 17 Mar 2024 05:49:14 +0000
Date: Mon, 18 Mar 2024 05:49:42 +0000

qesapworker-prg5.qa.suse.cz:
2184    Minion did not return. [Not connected]

https://stats.openqa-monitor.qa.suse.de/alerting/grafana/host_up_alert_qesapworker-prg5/view?orgId=1

The worker seemed to have hung up. No login prompt on serial tty.
Rebooted via IPMI.
Worker came up, but a systemd service failed: …

It seems like the NVMe disk is not found anymore. Maybe it died and the system subsequently freezed.

Acceptance criteria

  • AC1: osd-deployment passed again
  • AC2: qesapworker-prg5.qa.suse.cz back in production again

Suggestions

Rollback steps


Related issues 4 (0 open4 closed)

Related to openQA Infrastructure (public) - action #157453: [FIRING:1] host_up (qesapworker-prg5: host up alert openQA qesapworker-prg5 host_up_alert_qesapworker-prg5 worker)Rejectedokurz2024-03-18

Actions
Related to openQA Infrastructure (public) - action #166520: [alert][FIRING:1] qesapworker-prg5 (qesapworker-prg5: host up alert host_up openQA host_up_alert_qesapworker-prg5 worker) size:SResolvednicksinger2024-09-092024-09-24

Actions
Related to openQA Infrastructure (public) - action #164907: [alert][FIRING:1] host_up (qesapworker-prg5: host up alert openQA, qesapworker-prg5-mgmt.qa.suse.cz not reachable, failing osd-deploymentResolvedokurz2024-08-04

Actions
Copied to openQA Infrastructure (public) - action #167164: osd-deployment | Minions returned with non-zero exit code (qesapworker-prg5.qa.suse.cz) size:MResolvedybonatakis

Actions
Actions

Also available in: Atom PDF