Project

General

Profile

Actions

action #107515

closed

[Alerting] web UI: Too many Minion job failures alert size:S

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2022-02-24
Due date:
% Done:

0%

Estimated time:

Description

Observation

Too many Minion jobs have failed on openqa.suse.de Review the failed jobs on https://openqa.suse.de/minion/jobs?state=failed and create a ticket if there's not already one (see https://progress.opensuse.org/issues/96263 and related tickets) and the failed jobs aren't just a symptom of a bigger problem (e.g. database outage). After investigation remove the failed jobs (possibly keeping one instance of a failure kind around). For the general log of the Minion job queue, checkout journalctl -fu openqa-gru.service and /var/log/openqa_gru on openqa.suse.de.

Suggestion

  • Just clean them up when we know the issues and have tickets for that or create new tickets for that

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #107437: [alert] Recurring "no data" alerts with only few minutes of outages since SUSE Nbg QA labs move size:MResolvedokurz2022-02-23

Actions
Actions #1

Updated by okurz almost 3 years ago

  • Related to action #107437: [alert] Recurring "no data" alerts with only few minutes of outages since SUSE Nbg QA labs move size:M added
Actions #2

Updated by mkittler almost 3 years ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #3

Updated by mkittler almost 3 years ago

  • Status changed from In Progress to Resolved

Recent save_needle tasks are ok again and all issues are covered by #96263. So I only cleaned up the dashboard. The alert is good again.

Actions

Also available in: Atom PDF