Project

General

Profile

Actions

action #92110

closed

Several Job age (scheduled) alerts on Sunday

Added by livdywan over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2021-05-04
Due date:
2021-05-18
% Done:

0%

Estimated time:

Description

Grafana osd-admins@suse.de[OK] Job age (scheduled) (max) alert
[OK] Job age (scheduled) (max) alert Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection of th…
Grafana osd-admins@suse.de[OK] Job age (scheduled) (median) alert
[OK] Job age (scheduled) (median) alert Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#note-2 a…
Grafana osd-admins@suse.de[Alerting] Job age (scheduled) (max) alert
[Alerting] Job age (scheduled) (max) alert Jobs not scheduled for 4 days (345600s). Possible reasons: * There are no online workers for selected scheduled jobs, misconfiguration on the side of tests likely See https://progress.opensuse.org/issues/73174#note-2 for an explanation of the selection…
Grafana osd-admins@suse.de[Alerting] Job age (scheduled) (median) alert
[Alerting] Job age (scheduled) (median) alert Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#no…
Grafana osd-admins@suse.de[OK] Job age (scheduled) (median) alert
[OK] Job age (scheduled) (median) alert Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#note-2 a…
Grafana osd-admins@suse.de[Alerting] Job age (scheduled) (median) alert
[Alerting] Job age (scheduled) (median) alert Check for overall decrease of "time to start". Possible reasons for regression: * Not enough ressources * Too many tests scheduled due to misconfiguration 2020-11-27: Alert limit set to 259200s = 3d, see https://progress.opensuse.org/issues/73174#no…

Actions #1

Updated by okurz over 3 years ago

  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Due date set to 2021-05-18
  • Status changed from New to Feedback
  • Assignee set to mkittler
  • Priority changed from Normal to High
  • Target version set to Ready

@mkittler you already did something for these cases, related to pc_gce, was it?

Actions #2

Updated by mkittler over 3 years ago

I did nothing besides informing users. I suppose some of the tests got cancelled, indeed. However, there are still a few 7 days old scheduled jobs. Apparently not enough to trigger the job age alerts, though.

Actions #3

Updated by okurz over 3 years ago

  • Status changed from Feedback to New
  • Assignee deleted (mkittler)

ok, unassigning you again then.

Actions #4

Updated by okurz over 3 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz
Actions #5

Updated by okurz over 3 years ago

  • Status changed from In Progress to Resolved

https://openqa.suse.de/tests/ shows currently 4 jobs that are scheduled for 8 days, e.g. https://openqa.suse.de/tests/5918513 with worker class "WORKER_CLASS=qemu_x86_64,pc_gce" which is not fulfilled by any worker instances anymore after my MR https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/308 which I worked on for #91458 . I checked the job templates in the job group https://openqa.suse.de/admin/job_templates/275 and it looks like someone fixed that problem in the meantime. So this explains the alerts. I cancelled all four remaining jobs and pointed to the current ticket in the openQA jobs and pinged @jlausuch in https://chat.suse.de/channel/testing . Currently no failing alert.

Actions

Also available in: Atom PDF