action #97382
closedARM automatic reboot pipeline does not fail if ipmitool fails size:S
0%
Description
The most recent recovery attempt for openqaworker-arm-3 triggered a pipeline which failed but is shown as "succeeded": https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/534098#L34
A quick look at https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/ipmi-recover-worker shows we have "set -e" already in place. So not sure why the exit-code of the failing ipmitool did not reach the pipeline runner.
AC1: Let the user know that "Error: Unable to establish IPMI v2 / RMCP+ session" is not the final reason why the job ended - e.g.: "IPMI tool failed after x retries. creating Infra service ticket now"
AC2: Check if the ticket creation was successful. Make the pipeline status depending on that final step so one can clearly see if the pipeline did something or not. It also helps monitoring the situation as subscribed people would receive a mail if everything fails (hence manual investigation from our side is needed)
Updated by dheidler about 3 years ago
- Related to action #97244: openqaworker-arm-3 is offline and EngInfra wants us to create JiraSD tickets instead of infra size:M added
Updated by dheidler about 3 years ago
- Subject changed from ARM automatic reboot pipeline does not fail if ipmitool fails to ARM automatic reboot pipeline does not fail if ipmitool fails size:S
- Status changed from New to In Progress
- Assignee set to dheidler
Updated by dheidler about 3 years ago
This happens because the ipmitool call is used in this context:
if ! $ipmitool chassis status; then
In a failing condition it will just create an infra ticket.
Updated by dheidler about 3 years ago
- Status changed from In Progress to Rejected
Okurz wrote:
if ipmi failed for X times and we resorted to reporting a ticket this should be a successful pipeline
This seems to be expected behavior so I will reject this ticket.
Updated by nicksinger about 3 years ago
- Status changed from Rejected to New
I think we should still improve the printed messages here. IMHO it is highly confusing if a job succeeds if the last message is "Error: Unable to establish IPMI v2 / RMCP+ session". I will adjust the title and include some ACs with what could be improved. @dheidler feel free to unassign yourself it you don't want to continue working on these improvements.
Updated by dheidler about 3 years ago
I will improve the script with some more log output.
Updated by dheidler about 3 years ago
- Status changed from New to In Progress
The ticket is created by the line
printf %b "Subject: $subject\n\n$EMAIL\n\n" | msmtp --from "$from" -t "$contact"
which should change in the near future (see https://progress.opensuse.org/issues/97244?).
When this command fails, the pipeline should already fail due to set -e -o pipefail
so I think AC2 is already present.
Updated by dheidler about 3 years ago
- Status changed from In Progress to Feedback
Updated by nicksinger about 3 years ago
Merged, thanks.
dheidler wrote:
The ticket is created by the line
printf %b "Subject: $subject\n\n$EMAIL\n\n" | msmtp --from "$from" -t "$contact"
which should change in the near future (see https://progress.opensuse.org/issues/97244?).
When this command fails, the pipeline should already fail due to
set -e -o pipefail
so I think AC2 is already present.
I see, this is fire-and-forget and we don't really have a way to tell if a ticket was created. I will extend the other ticket to include this AC. I don't see it fulfilled but understand that it is currently unfeasible to implement with this approach.
Updated by dheidler about 3 years ago
- Status changed from Feedback to Resolved
With https://progress.opensuse.org/issues/97244 that should change and until then I guess we can close this one.