Project

General

Profile

Actions

action #94237

closed

No alert about too many scheduled tests size:S

Added by okurz almost 3 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2021-06-18
Due date:
2021-08-30
% Done:

0%

Estimated time:

Description

Observation

https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?tab=alert&editPanel=9&viewPanel=9&orgId=1&from=1623621600000&to=1623967199000 shows that for a longer time there had been more than 3k scheduled tests but grafana does not show that it would have sent alert messages.

Acceptance criteria

  • AC1: Alert messages are sent when scheduled jobs exceed a defined alert threshold

Suggestions

  • Investigate why there was this sudden surge in nearly 9k blocked jobs on 2021-06-14. Was that because openQABot was offline in the days before that? -> yes, we assume that was the case
  • Investigate if there was an alert message which maybe just does not show in grafana but was still sent -> according to http://mailman.suse.de/mlarch/SuSE/osd-admins/2021/osd-admins.2021.06/maillist.html there was no such email received by osd-admins@suse.de
  • Crosscheck system logs on the grafana instance
  • Ensure that alert messages are sent

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #97043: job queue hitting new record 14k jobsResolvedokurz2021-08-17

Actions
Actions #2

Updated by jbaier_cz almost 3 years ago

The openQABot was offline since Jun 10th without anyone noticing. It was re-enabled on Monday and all the missing jobs were scheduled during the afternoon.

Actions #3

Updated by okurz almost 3 years ago

  • Priority changed from High to Normal
Actions #4

Updated by okurz almost 3 years ago

  • Subject changed from No alert about too many scheduled tests to No alert about too many scheduled tests size:S
  • Description updated (diff)
Actions #5

Updated by okurz almost 3 years ago

  • Target version changed from Ready to future
Actions #6

Updated by okurz over 2 years ago

  • Due date set to 2021-08-30
  • Status changed from Workable to Feedback
  • Assignee set to okurz
  • Target version changed from future to Ready

just stumbled over the reason for the non-working alert. Have a fix ready https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/553

Actions #7

Updated by okurz over 2 years ago

  • Status changed from Feedback to Resolved

MR merged, alert triggered as we are above the schedule. Continuing in #97043

Actions #8

Updated by okurz over 2 years ago

  • Related to action #97043: job queue hitting new record 14k jobs added
Actions

Also available in: Atom PDF