Actions
action #98499
closed[alert] web UI: Too many Minion job failures alert size:S
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2021-09-13
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
alert received on 2021-09-13 at the time when OSD was deployed. https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=19&orgId=1&from=now-7d&to=now shows there were already 20 failed and during deployment – which could be coincidence – another minion job failed reaching 21 failed minion jobs.
Acceptance criteria¶
- AC1: At least a ticket exists for each different issue
- AC2: The alert description mentions the tickets for all known issues that could explain failures
Suggestions¶
- Review current failures and ensure that a ticket exists for each type (see related tickets)
- Remove all failed minion jobs after ensuring the problem is recorded in tickets
- Unpause alert
Rollback measures¶
- Unpause alert
Actions