action #97382
closed
ARM automatic reboot pipeline does not fail if ipmitool fails size:S
Added by nicksinger over 3 years ago.
Updated over 3 years ago.
Description
The most recent recovery attempt for openqaworker-arm-3 triggered a pipeline which failed but is shown as "succeeded": https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/534098#L34
A quick look at https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/ipmi-recover-worker shows we have "set -e" already in place. So not sure why the exit-code of the failing ipmitool did not reach the pipeline runner.
AC1: Let the user know that "Error: Unable to establish IPMI v2 / RMCP+ session" is not the final reason why the job ended - e.g.: "IPMI tool failed after x retries. creating Infra service ticket now"
AC2: Check if the ticket creation was successful. Make the pipeline status depending on that final step so one can clearly see if the pipeline did something or not. It also helps monitoring the situation as subscribed people would receive a mail if everything fails (hence manual investigation from our side is needed)
- Target version set to Ready
- Related to action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M added
- Subject changed from ARM automatic reboot pipeline does not fail if ipmitool fails to ARM automatic reboot pipeline does not fail if ipmitool fails size:S
- Status changed from New to In Progress
- Assignee set to dheidler
This happens because the ipmitool call is used in this context:
if ! $ipmitool chassis status; then
In a failing condition it will just create an infra ticket.
- Status changed from In Progress to Rejected
Okurz wrote:
if ipmi failed for X times and we resorted to reporting a ticket this should be a successful pipeline
This seems to be expected behavior so I will reject this ticket.
- Status changed from Rejected to New
I think we should still improve the printed messages here. IMHO it is highly confusing if a job succeeds if the last message is "Error: Unable to establish IPMI v2 / RMCP+ session". I will adjust the title and include some ACs with what could be improved. @dheidler feel free to unassign yourself it you don't want to continue working on these improvements.
- Description updated (diff)
I will improve the script with some more log output.
- Status changed from New to In Progress
The ticket is created by the line
printf %b "Subject: $subject\n\n$EMAIL\n\n" | msmtp --from "$from" -t "$contact"
which should change in the near future (see https://progress.opensuse.org/issues/97244?).
When this command fails, the pipeline should already fail due to set -e -o pipefail
so I think AC2 is already present.
- Status changed from In Progress to Feedback
Merged, thanks.
dheidler wrote:
The ticket is created by the line
printf %b "Subject: $subject\n\n$EMAIL\n\n" | msmtp --from "$from" -t "$contact"
which should change in the near future (see https://progress.opensuse.org/issues/97244?).
When this command fails, the pipeline should already fail due to set -e -o pipefail
so I think AC2 is already present.
I see, this is fire-and-forget and we don't really have a way to tell if a ticket was created. I will extend the other ticket to include this AC. I don't see it fulfilled but understand that it is currently unfeasible to implement with this approach.
- Status changed from Feedback to Resolved
Also available in: Atom
PDF