Project

General

Profile

action #160343

Updated by livdywan 7 months ago

## Observation 
 From email opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details 
         
	 WARNINGs: rc_failed_per_5min is 10.00 (outside range [:5]). 
 ``` 
 % sudo journalctl -u openqa-gru --since '2024-05-13' 
 May 13 19:14:28 new-ariel openqa-gru[8840]: /opt/os-autoinst-scripts/_common: ERROR: line 78 
 May 13 19:14:28 new-ariel openqa-gru[8838]: /opt/os-autoinst-scripts/_common: ERROR: line 78 
 May 13 19:14:28 new-ariel openqa-gru[8838]: curl (172 /opt/os-autoinst-scripts/openqa-label-known-issues): Error fetching (-L --user-agent openqa-label-known-issues -sS https://progress.opensuse.org/projects/o> 
 May 13 19:14:28 new-ariel openqa-gru[8838]: 000 
 May 13 19:14:28 new-ariel openqa-gru[8837]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 
 May 13 19:14:28 new-ariel openqa-gru[8832]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 
 May 13 19:14:38 new-ariel openqa-gru[8881]: /opt/os-autoinst-scripts/_common: ERROR: line 78 
 May 13 19:14:38 new-ariel openqa-gru[8879]: /opt/os-autoinst-scripts/_common: ERROR: line 78 
 May 13 19:14:38 new-ariel openqa-gru[8879]: curl (172 /opt/os-autoinst-scripts/openqa-label-known-issues): Error fetching (-L --user-agent openqa-label-known-issues -sS https://progress.opensuse.org/projects/o> 
 May 13 19:14:38 new-ariel openqa-gru[8879]: 000 
 May 13 19:14:38 new-ariel openqa-gru[8878]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 
 May 13 19:14:38 new-ariel openqa-gru[8873]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172 
 May 13 19:15:07 new-ariel openqa-gru[8988]: /opt/os-autoinst-scripts/_common: ERROR: line 78 
 May 13 19:15:07 new-ariel openqa-gru[8986]: /opt/os-autoinst-scripts/_common: ERROR: line 78 
 ... 
 ``` 

 ## Acceptance criteria 
 * **AC1**: Temporary outage of poo is tolerated e.g. < 1h 

 ## Suggestions 
 * Add an exponential retry 
   * https://github.com/os-autoinst/scripts/blob/master/openqa-label-known-issues#L172 
 * Look into retrying via Minion 
   * We *should* have support for returning a special code to get a new minion job somewhere? Research what we have and use that 
   * Do we have a timeout for the whole hook script? Check where that is defined 
   OPENQA_JOB_DONE_HOOK_KILL_TIMEOUT

Back