Project

General

Profile

action #97382

Updated by nicksinger over 2 years ago

The most recent recovery attempt for openqaworker-arm-3 triggered a pipeline which failed but is shown as "succeeded": https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/534098#L34 

 ~~A A quick look at https://gitlab.suse.de/openqa/grafana-webhook-actions/-/blob/master/ipmi-recover-worker shows we have "set -e" already in place. So not sure why the exit-code of the failing ipmitool did not reach the pipeline runner.~~ 

 **AC1**: Let the user know that "Error: Unable to establish IPMI v2 / RMCP+ session" is not the final reason why the job ended - e.g.: "IPMI tool failed after x retries. creating Infra service ticket now" 
 **AC2**: Check if the ticket creation was successful. Make the pipeline status depending on that final step so one can clearly see if the pipeline did something or not. It also helps monitoring the situation as subscribed people would receive a mail if everything fails (hence manual investigation from our side is needed) runner.

Back