action #160343
closed
[alert][o3] opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details WARNINGs: rc_failed_per_5min is 10.00 (outside range [:5]). size:S
Added by okurz 7 months ago.
Updated 7 months ago.
Category:
Regressions/Crashes
Description
Observation¶
From email opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details
WARNINGs: rc_failed_per_5min is 10.00 (outside range [:5]).
% sudo journalctl -u openqa-gru --since '2024-05-13'
May 13 19:14:28 new-ariel openqa-gru[8840]: /opt/os-autoinst-scripts/_common: ERROR: line 78
May 13 19:14:28 new-ariel openqa-gru[8838]: /opt/os-autoinst-scripts/_common: ERROR: line 78
May 13 19:14:28 new-ariel openqa-gru[8838]: curl (172 /opt/os-autoinst-scripts/openqa-label-known-issues): Error fetching (-L --user-agent openqa-label-known-issues -sS https://progress.opensuse.org/projects/o>
May 13 19:14:28 new-ariel openqa-gru[8838]: 000
May 13 19:14:28 new-ariel openqa-gru[8837]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 13 19:14:28 new-ariel openqa-gru[8832]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 13 19:14:38 new-ariel openqa-gru[8881]: /opt/os-autoinst-scripts/_common: ERROR: line 78
May 13 19:14:38 new-ariel openqa-gru[8879]: /opt/os-autoinst-scripts/_common: ERROR: line 78
May 13 19:14:38 new-ariel openqa-gru[8879]: curl (172 /opt/os-autoinst-scripts/openqa-label-known-issues): Error fetching (-L --user-agent openqa-label-known-issues -sS https://progress.opensuse.org/projects/o>
May 13 19:14:38 new-ariel openqa-gru[8879]: 000
May 13 19:14:38 new-ariel openqa-gru[8878]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 13 19:14:38 new-ariel openqa-gru[8873]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 13 19:15:07 new-ariel openqa-gru[8988]: /opt/os-autoinst-scripts/_common: ERROR: line 78
May 13 19:15:07 new-ariel openqa-gru[8986]: /opt/os-autoinst-scripts/_common: ERROR: line 78
...
Acceptance criteria¶
- AC1: Temporary outage of poo is tolerated e.g. < 1h
Suggestions¶
- Add an exponential retry
- Look into retrying via Minion
- We should have support for returning a special code to get a new minion job somewhere? Research what we have and use that
- Do we have a timeout for the whole hook script? Check where that is defined
OPENQA_JOB_DONE_HOOK_KILL_TIMEOUT
- Description updated (diff)
- Description updated (diff)
- Priority changed from Urgent to High
We are getting this from time to time when progress.o.o is down.
So I don't consider it urgent.
But I think we don't do a retry here yet.
From the logs we can see that progress was not reachable for about 17 minutes.
- Subject changed from [alert][o3] opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details WARNINGs: rc_failed_per_5min is 10.00 (outside range [:5]). to [alert][o3] opensuse.org :: openqa.opensuse.org :: hook failed - see openqa-gru service logs for details WARNINGs: rc_failed_per_5min is 10.00 (outside range [:5]). size:S
- Description updated (diff)
- Status changed from New to Workable
- Status changed from Workable to In Progress
- Assignee set to dheidler
- Status changed from In Progress to Feedback
on o3:
% sudo journalctl -u openqa-gru --since '2024-05-23'
May 23 13:12:50 new-ariel openqa-gru[15264]: /opt/os-autoinst-scripts/_common: line 90: /usr/bin/seq: Permission denied
May 23 13:12:50 new-ariel openqa-gru[15262]: /opt/os-autoinst-scripts/_common: ERROR: line 90
May 23 13:12:50 new-ariel openqa-gru[15259]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:12:50 new-ariel openqa-gru[15254]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
I guess we need to allow seq
in apparmor
Now we see this without further error messages:
May 23 13:47:04 new-ariel openqa-gru[1022]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:47:04 new-ariel openqa-gru[1017]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:15 new-ariel openqa-gru[2531]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:15 new-ariel openqa-gru[2526]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:16 new-ariel openqa-gru[2582]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:16 new-ariel openqa-gru[2577]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:27 new-ariel openqa-gru[2705]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:27 new-ariel openqa-gru[2700]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:31 new-ariel openqa-gru[2761]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:49:31 new-ariel openqa-gru[2756]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
May 23 13:50:05 new-ariel openqa-gru[3286]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 172
- Status changed from Feedback to Resolved
Also available in: Atom
PDF