Project

General

Profile

Actions

action #95443

closed

Variants of Job age (scheduled) alerts on Grafana on Sunday and Monday size:S

Added by livdywan almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2021-07-13
Due date:
% Done:

0%

Estimated time:

Description

Observation

I observed several unhandled alerts on Grafana on Sunday and Monday.

[Alerting] Job age (scheduled) (max) alert

Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection of the specific value
Metric name

Value
50% percentile (max)

501773.500

click

[Alerting] Job age (scheduled) (median) alert

Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#note-2 about the decision Related progress issue: https://progress.opensuse.org/issues/65975
Metric name

Value
50% percentile (median)

501113.500

click

[Alerting] Job age (scheduled) (max) alert

Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection of the specific value
Metric name

Value
50% percentile (max)

954811.000

click

[No Data] Incomplete jobs (not restarted) of last 24h alert click

Acceptance criteria

  • AC1: The cause of the alerts is clear or a follow-up ticket is filed with a feature request to have the necessary details next time

Suggestions

  • Look at the alert history in Grafana
  • Look at all tests and check for cancelled jobs or removed workers
  • Other issues handling these alerts recently: #93612 and #92110 according to a quick search.
Actions

Also available in: Atom PDF