Actions
action #93612
closedSeveral unhandled alerts triggered regarding incompletes and running out of space
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2021-06-08
Due date:
% Done:
0%
Estimated time:
Description
These are not necessarily related but to start with I'd like to preserve these alerts because this is a lot of unhandled alerts in succession. It may make sense to split or move some out.
Note: All of these went OK eventually as far as I can tell but I couldn't determine why.
File systems¶
http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=74&orgId=1
[No Data] File systems alert
One of the file systems is too full
Metric name
Value
Job age¶
[Alerting] Job age (scheduled) (median) alert
Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#note-2 about the decision Related progress issue: https://progress.opensuse.org/issues/65975
Metric name
Value
50% percentile (median)
566107.500
[Alerting] Job age (scheduled) (max) alert
Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection of the specific value
Metric name
Value
50% percentile (max)
525289.000
[Alerting] Job age (scheduled) (max) alert
Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection of the specific value
Metric name
Value
50% percentile (max)
556986.500
[Alerting] Job age (scheduled) (median) alert
Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#note-2 about the decision Related progress issue: https://progress.opensuse.org/issues/65975
Metric name
Value
50% percentile (median)
620407.500
http://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?tab=alert&viewPanel=5&orgId=1
Incomplete jobs¶
[No Data] Incomplete jobs (not restarted) of last 24h alert
Metric name
Value
http://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?tab=alert&viewPanel=17&orgId=1
Actions