Project

General

Profile

Actions

action #135740

closed

[alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org - only "label_known_issues" hook scriptssize:M

Added by livdywan 8 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-07-16
Due date:
2023-10-05
% Done:

0%

Estimated time:
Tags:

Description

Observation

Date: Thu, 14 Sep 2023 01:05:11 +0000
opensuse.org :: openqa.opensuse.org :: hook failed
        WARNINGs: rc_failed_per_5min is 7.00 (outside range [:5]).

opensuse.org :: openqa.opensuse.org :: hook failed
        OKs: rc_failed_per_5min is 1.00.

This seems to be coming from:

Received: from ariel.suse-dmz.opensuse.org (openqa.infra.opensuse.org

[192.168.47.13])

Acceptance criteria

  • AC1: No more fatal error messages
  • AC2: No high amount of investigation jobs fails silently

Suggestions

  • journalctl --since="2023-09-13 00:00:00" -u openqa-gru
  • Consider adding retries since it looks like temporary network issues
Sep 14 01:02:12 new-ariel openqa-gru[30441]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 111
Sep 14 01:02:12 new-ariel openqa-gru[30418]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 111
Sep 14 01:02:13 new-ariel openqa-gru[30510]: Connect timeout

line 111: job_data=$(openqa-cli "${client_args[@]}" jobs/"$id")


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #136274: Failing DNS resolution on o3 for hosts like github.comResolvedokurz2023-09-21

Actions
Copied from openQA Infrastructure - action #135737: [alert] Munin - network eth errors - opensuse.org :: openqa.opensuse.org size:MResolvedtinita2023-07-16

Actions
Actions #1

Updated by livdywan 8 months ago

  • Copied from action #135737: [alert] Munin - network eth errors - opensuse.org :: openqa.opensuse.org size:M added
Actions #2

Updated by livdywan 8 months ago

  • Subject changed from [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org to [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by livdywan 8 months ago

  • Status changed from Workable to In Progress
  • Assignee set to livdywan

Setting --retries seems like it would help here: https://github.com/os-autoinst/scripts/pull/260

Actions #4

Updated by openqa_review 8 months ago

  • Due date set to 2023-10-05

Setting due date based on mean cycle time of SUSE QE Tools

Actions #5

Updated by okurz 8 months ago

New problem, same symptom: #136274 also showing in failed minion hook scripts.

Actions #6

Updated by okurz 8 months ago

  • Related to action #136274: Failing DNS resolution on o3 for hosts like github.com added
Actions #7

Updated by livdywan 8 months ago

  • Status changed from In Progress to Feedback

livdywan wrote in #note-3:

Setting --retries seems like it would help here: https://github.com/os-autoinst/scripts/pull/260

Let's see if it helps at all

Actions #8

Updated by okurz 8 months ago

  • Subject changed from [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:M to [alert] Munin - minion hook failed - opensuse.org :: openqa.opensuse.org - only "label_known_issues" hook scriptssize:M
Actions #9

Updated by livdywan 7 months ago

  • Status changed from Feedback to Resolved

Looks to be fine. Hasn't come back.

Actions

Also available in: Atom PDF