Project

General

Profile

Actions

action #138545

closed

Munin - minion hook failed - opensuse.org :: openqa.opensuse.org size:S

Added by livdywan about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
2023-11-28
% Done:

0%

Estimated time:
Tags:

Description

Observation

Email from 2023-10-24 21.45 CEST

opensuse.org :: openqa.opensuse.org :: hook failed
   CRITICALs: rc_failed_per_5min is 17.00 (outside range [:10]).

Apparently the journal contains these messages (sudo journalctl -u openqa-gru):

Oct 24 19:37:51 new-ariel openqa-gru[13130]: Connect timeout
Oct 24 19:37:51 new-ariel openqa-gru[13130]: 
Oct 24 19:37:51 new-ariel openqa-gru[13128]: Connect timeout
Oct 24 19:37:51 new-ariel openqa-gru[13128]: 
Oct 24 19:37:51 new-ariel openqa-gru[13129]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 117
Oct 24 19:37:51 new-ariel openqa-gru[13066]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 117
Oct 24 19:37:51 new-ariel openqa-gru[13127]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 117
Oct 24 19:37:51 new-ariel openqa-gru[13077]: /opt/os-autoinst-scripts/openqa-label-known-issues: ERROR: line 117

Suggestions

  • Include the source of the alert in the email e.g. systemd journal for openqa-gru
  • Investigate what's causing openqa-label-known-issues to abort

Files

munin-load.png (23.3 KB) munin-load.png tinita, 2023-10-27 12:31
munin-hook-failed.png (16.9 KB) munin-hook-failed.png tinita, 2023-10-27 12:31

Related issues 4 (2 open2 closed)

Related to openQA Infrastructure (public) - action #138527: Zabbix agent on ariel.dmz-prg2.suse.org reported no data for 30m and there is nothing in the journal size:SResolvedlivdywan2023-07-07

Actions
Related to openQA Infrastructure (public) - action #138551: DNS outage of 2023-10-25, e.g. Cron <root@openqa-service> (date; fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log Max retries exceeded with url size:SResolvedlivdywan2023-10-23

Actions
Related to openQA Infrastructure (public) - action #138683: https://metrics.opensuse.org/d/osrt_openqa/osrt-openqa?orgId=1&from=now-7d&to=now should show current results … which apparently it doesn'tNew2023-10-27

Actions
Related to openQA Infrastructure (public) - action #162521: Reconsider the global job limit on o3, try higher than 170New2024-06-19

Actions
Actions

Also available in: Atom PDF