Actions
action #112193
closed[alert][osd] web UI: Too many Minion job failures alert size:S
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2022-06-08
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
Too many Minion jobs have failed on openqa.suse.de
Acceptance criteria¶
- AC1: No more alerts
- AC2: All issues are resolved or reported in currently open tickets
Suggestions¶
- Review the failed jobs on https://openqa.suse.de/minion/jobs?state=failed and create a ticket if there's not already one (see #96263 and related tickets) and the failed jobs aren't just a symptom of a bigger problem (e.g. database outage).
- After investigation remove the failed jobs (possibly keeping one instance of a failure kind around). For the general log of the Minion job queue, checkout
journalctl -fu openqa-gru.service
and/var/log/openqa_gru
on openqa.suse.de. - Probably file a ticket for the rsync issue (after narrowing it down a bit)
Rollback steps¶
Updated by livdywan over 2 years ago
- Subject changed from [alert][osd] web UI: Too many Minion job failures alert to [alert][osd] web UI: Too many Minion job failures alert size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler over 2 years ago
- Status changed from Workable to Resolved
Most failures were covered by #96263 but there are also two new cases. I extended the ticket description for these new cases. I also cleaned up the Minion dashboard and resumed the alert.
Actions