Actions
action #107515
closed[Alerting] web UI: Too many Minion job failures alert size:S
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2022-02-24
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Too many Minion jobs have failed on openqa.suse.de Review the failed jobs on https://openqa.suse.de/minion/jobs?state=failed and create a ticket if there's not already one (see https://progress.opensuse.org/issues/96263 and related tickets) and the failed jobs aren't just a symptom of a bigger problem (e.g. database outage). After investigation remove the failed jobs (possibly keeping one instance of a failure kind around). For the general log of the Minion job queue, checkout journalctl -fu openqa-gru.service
and /var/log/openqa_gru
on openqa.suse.de.
Suggestion¶
- Just clean them up when we know the issues and have tickets for that or create new tickets for that
Updated by okurz almost 3 years ago
- Related to action #107437: [alert] Recurring "no data" alerts with only few minutes of outages since SUSE Nbg QA labs move size:M added
Updated by mkittler almost 3 years ago
- Status changed from Workable to In Progress
- Assignee set to mkittler
Updated by mkittler almost 3 years ago
- Status changed from In Progress to Resolved
Recent save_needle
tasks are ok again and all issues are covered by #96263. So I only cleaned up the dashboard. The alert is good again.
Actions