Project

General

Profile

action #94438

OSD deployment fails at 2021-06-21 because ' openqaworker (arm-3 and arm-2) Minion did not return'

Added by Xiaojing_liu 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2021-06-22
Due date:
% Done:

0%

Estimated time:

Description

Observation

OSD deployment failed, reason showed:

    openqaworker-arm-2.suse.de:
    Minion did not return. [Not connected]
    openqaworker-arm-3.suse.de:
    Minion did not return. [Not connected]

See details in: https://gitlab.suse.de/openqa/osd-deployment/-/jobs/466302

I have rebooted arm-2 by using ipmitool power cycle, but arm-3 cannot be connected with ipmitool

   #ipmitool -I lanplus -C 3 -H openqaworker-arm-3-ipmi.suse.de chassis power status
    Error: Unable to establish IPMI v2 / RMCP+ session

Acceptance criteria

  • AC1: OSD deployment continued
  • AC2: both openqaworker-arm-2 and openqaworker-arm-3 are online again

Related issues

Related to openQA Infrastructure - action #94399: No alert when arm workers are offline, alert if telegraf throws errors size:MWorkable2021-06-22

History

#1 Updated by Xiaojing_liu 3 months ago

  • Related to action #94399: No alert when arm workers are offline, alert if telegraf throws errors size:M added

#2 Updated by okurz 3 months ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Target version set to Ready

#3 Updated by okurz 3 months ago

  • Status changed from Workable to Resolved
  • Assignee set to okurz

I brought back workers manually now as the automatic recovery was broken at the time. Deployment continued and finished successfully.

Also available in: Atom PDF