Project

General

Profile

Actions

action #177324

open

[alert] worker-arm2 is down

Added by ybonatakis 1 day ago. Updated about 13 hours ago.

Status:
Blocked
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2025-02-17
Due date:
2025-03-04 (Due in 14 days)
% Done:

0%

Estimated time:

Description

I couldnt ssh into worker-arm2 to get any logs, so I have no idea what service failed or what happened

http://monitor.qa.suse.de/goto/9w9845cNR?orgId=1

Rollback steps

salt-key --accept=worker-arm2.oqa.prg2.suse.org

Actions #1

Updated by okurz 1 day ago

  • Tags changed from infra to infra, reactive work
  • Target version set to Ready
Actions #2

Updated by nicksinger 1 day ago

  • Status changed from New to In Progress
  • Assignee set to nicksinger
Actions #3

Updated by nicksinger 1 day ago

  • Status changed from In Progress to Blocked
  • Priority changed from High to Normal

Machine + BMC cannot be pinged. Access to documented PDUs in https://racktables.suse.de/index.php?page=object&tab=default&object_id=22943 not possible so creating a SD ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-180419
Also removing the machine from production according to https://progress.opensuse.org/projects/openqav3/wiki#Take-machines-out-of-salt-controlled-production

Actions #4

Updated by nicksinger 1 day ago

  • Description updated (diff)
Actions #5

Updated by okurz 1 day ago

  • Status changed from Blocked to New
  • Priority changed from Normal to High

nicksinger wrote in #note-3:

Machine + BMC cannot be pinged. Access to documented PDUs in https://racktables.suse.de/index.php?page=object&tab=default&object_id=22943 not possible so creating a SD ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-180419

How did you try to access the PDUs? Did you follow
https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md#prg2 ?

Actions #6

Updated by nicksinger 1 day ago

  • Status changed from New to In Progress

okurz wrote in #note-5:

nicksinger wrote in #note-3:

Machine + BMC cannot be pinged. Access to documented PDUs in https://racktables.suse.de/index.php?page=object&tab=default&object_id=22943 not possible so creating a SD ticket: https://sd.suse.com/servicedesk/customer/portal/1/SD-180419

How did you try to access the PDUs? Did you follow
https://gitlab.suse.de/suse/wiki/-/blob/main/qe_infrastructure.md#prg2 ?

just by accessing the URL in my browser. Totally forgot about this one, thanks for the reminder. Will power-cycle the BMC now and update SD if needed

Actions #7

Updated by openqa_review about 19 hours ago

  • Due date set to 2025-03-04

Setting due date based on mean cycle time of SUSE QE Tools

Actions #8

Updated by nicksinger about 14 hours ago

  • Priority changed from High to Normal
Actions #9

Updated by nicksinger about 13 hours ago

  • Status changed from In Progress to Blocked

Updated the SD ticket with:

I powered down both ports and waited several minutes, however port 22 always shows >0W power. Anyhow, even after a full power-cycle, the BMC does not come back.
So I unfortunately have to ask you if you can check the machine physically if there is any sign of broken hardware or a loose ethernet plug. Thanks!

Actions

Also available in: Atom PDF