Project

General

Profile

Actions

action #97382

closed

ARM automatic reboot pipeline does not fail if ipmitool fails size:S

Added by nicksinger over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2021-08-23
Due date:
% Done:

0%

Estimated time:

Description

The most recent recovery attempt for openqaworker-arm-3 triggered a pipeline which failed but is shown as "succeeded": https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/534098#L34

A quick look at https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/ipmi-recover-worker shows we have "set -e" already in place. So not sure why the exit-code of the failing ipmitool did not reach the pipeline runner.

AC1: Let the user know that "Error: Unable to establish IPMI v2 / RMCP+ session" is not the final reason why the job ended - e.g.: "IPMI tool failed after x retries. creating Infra service ticket now"
AC2: Check if the ticket creation was successful. Make the pipeline status depending on that final step so one can clearly see if the pipeline did something or not. It also helps monitoring the situation as subscribed people would receive a mail if everything fails (hence manual investigation from our side is needed)


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:MResolveddheidler2021-08-192021-09-17

Actions
Actions

Also available in: Atom PDF