action #160343
Updated by livdywan 7 months ago
## Observation From email opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details WARNINGs: rc_failed_per_5min is 10.00 (outside range [:5]). ``` % sudo journalctl -u openqa-gru --since '2024-05-13' May 13 19:14:28 new-ariel openqa-gru[8840]: /opt/os-autoinst-scripts/_common: ERROR: line 78 May 13 19:14:28 new-ariel openqa-gru[8838]: /opt/os-autoinst-scripts/_common: ERROR: line 78 May 13 19:14:28 new-ariel openqa-gru[8838]: curl (172 /opt/os-autoinst-scripts/openqa-label-known-issues): Error fetching (-L --user-agent openqa-label-known-issues -sS https://progress.opensuse.org/projects/o> May 13 19:14:28 new-ariel openqa-gru[8838]: 000 May 13 19:14:28 new-ariel openqa-gru[8837]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 May 13 19:14:28 new-ariel openqa-gru[8832]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 May 13 19:14:38 new-ariel openqa-gru[8881]: /opt/os-autoinst-scripts/_common: ERROR: line 78 May 13 19:14:38 new-ariel openqa-gru[8879]: /opt/os-autoinst-scripts/_common: ERROR: line 78 May 13 19:14:38 new-ariel openqa-gru[8879]: curl (172 /opt/os-autoinst-scripts/openqa-label-known-issues): Error fetching (-L --user-agent openqa-label-known-issues -sS https://progress.opensuse.org/projects/o> May 13 19:14:38 new-ariel openqa-gru[8879]: 000 May 13 19:14:38 new-ariel openqa-gru[8878]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 May 13 19:14:38 new-ariel openqa-gru[8873]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 May 13 19:15:07 new-ariel openqa-gru[8988]: /opt/os-autoinst-scripts/_common: ERROR: line 78 May 13 19:15:07 new-ariel openqa-gru[8986]: /opt/os-autoinst-scripts/_common: ERROR: line 78 ... ``` ## Acceptance criteria * **AC1**: Temporary outage of poo is tolerated e.g. < 1h ## Suggestions * Add an exponential retry * https://github.com/os-autoinst/scripts/blob/master/openqa-label-known-issues#L172 * Look into retrying via Minion * We *should* have support for returning a special code to get a new minion job somewhere? Research what we have and use that * Do we have a timeout for the whole hook script? Check where that is defined OPENQA_JOB_DONE_HOOK_KILL_TIMEOUT