Project

General

Profile

Actions

action #93612

closed

Several unhandled alerts triggered regarding incompletes and running out of space

Added by livdywan over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2021-06-08
Due date:
% Done:

0%

Estimated time:

Description

These are not necessarily related but to start with I'd like to preserve these alerts because this is a lot of unhandled alerts in succession. It may make sense to split or move some out.

Note: All of these went OK eventually as far as I can tell but I couldn't determine why.

File systems

http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=74&orgId=1

[No Data] File systems alert

One of the file systems is too full
Metric name

Value

Job age

[Alerting] Job age (scheduled) (median) alert

Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#note-2 about the decision Related progress issue: https://progress.opensuse.org/issues/65975
Metric name

Value
50% percentile (median)

566107.500

[Alerting] Job age (scheduled) (max) alert

Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection of the specific value
Metric name

Value
50% percentile (max)

525289.000

[Alerting] Job age (scheduled) (max) alert

Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection of the specific value
Metric name

Value
50% percentile (max)

556986.500

[Alerting] Job age (scheduled) (median) alert

Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#note-2 about the decision Related progress issue: https://progress.opensuse.org/issues/65975
Metric name

Value
50% percentile (median)

620407.500

http://stats.openqa-monitor.qa.suse.de/d/7W06NBWGk/job-age?tab=alert&viewPanel=5&orgId=1

Incomplete jobs

[No Data] Incomplete jobs (not restarted) of last 24h alert

Metric name

Value

http://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?tab=alert&viewPanel=17&orgId=1

Actions

Also available in: Atom PDF