Actions
action #135740
closed[alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org - only "label_known_issues" hook scriptssize:M
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-07-16
Due date:
2023-10-05
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
Date: Thu, 14 Sep 2023 01:05:11 +0000
opensuse.org :: openqa.opensuse.org :: hook failed
WARNINGs: rc_failed_per_5min is 7.00 (outside range [:5]).
opensuse.org :: openqa.opensuse.org :: hook failed
OKs: rc_failed_per_5min is 1.00.
This seems to be coming from:
Received: from ariel.suse-dmz.opensuse.org (openqa.infra.opensuse.org
[192.168.47.13])
Acceptance criteria¶
- AC1: No more fatal error messages
- AC2: No high amount of investigation jobs fails silently
Suggestions¶
journalctl --since="2023-09-13 00:00:00" -u openqa-gru
- Consider adding retries since it looks like temporary network issues
Sep 14 01:02:12 new-ariel openqa-gru[30441]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 111
Sep 14 01:02:12 new-ariel openqa-gru[30418]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 111
Sep 14 01:02:13 new-ariel openqa-gru[30510]: Connect timeout
line 111: job_data=$(openqa-cli "${client_args[@]}" jobs/"$id")
Updated by livdywan over 1 year ago
- Copied from action #135737: [alert] Munin - network eth errors - opensuse.org :: openqa.opensuse.org size:M added
Updated by livdywan over 1 year ago
- Subject changed from [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org to [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by livdywan over 1 year ago
- Status changed from Workable to In Progress
- Assignee set to livdywan
Setting --retries
seems like it would help here: https://github.com/os-autoinst/scripts/pull/260
Updated by openqa_review over 1 year ago
- Due date set to 2023-10-05
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 1 year ago
New problem, same symptom: #136274 also showing in failed minion hook scripts.
Updated by okurz over 1 year ago
- Related to action #136274: Failing DNS resolution on o3 for hosts like github.com added
Updated by livdywan over 1 year ago
- Status changed from In Progress to Feedback
livdywan wrote in #note-3:
Setting
--retries
seems like it would help here: https://github.com/os-autoinst/scripts/pull/260
Let's see if it helps at all
Updated by okurz over 1 year ago
- Subject changed from [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:M to [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org - only "label_known_issues" hook scriptssize:M
Updated by livdywan over 1 year ago
- Status changed from Feedback to Resolved
Looks to be fine. Hasn't come back.
Actions