Project

General

Profile

Actions

action #112193

closed

[alert][osd] web UI: Too many Minion job failures alert size:S

Added by okurz about 2 years ago. Updated about 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2022-06-08
Due date:
% Done:

0%

Estimated time:

Description

Observation

Too many Minion jobs have failed on openqa.suse.de

Acceptance criteria

  • AC1: No more alerts
  • AC2: All issues are resolved or reported in currently open tickets

Suggestions

  • Review the failed jobs on https://openqa.suse.de/minion/jobs?state=failed and create a ticket if there's not already one (see #96263 and related tickets) and the failed jobs aren't just a symptom of a bigger problem (e.g. database outage).
  • After investigation remove the failed jobs (possibly keeping one instance of a failure kind around). For the general log of the Minion job queue, checkout journalctl -fu openqa-gru.service and /var/log/openqa_gru on openqa.suse.de.
  • Probably file a ticket for the rsync issue (after narrowing it down a bit)

Rollback steps

Unpause alert "Minion Jobs"

Actions

Also available in: Atom PDF