Project

General

Profile

Actions

action #133397

closed

openQA Project - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project - coordination #108209: [epic] Reduce load on OSD

HTTP Response alert Salt alerting and autoresolving shortly size:M

Added by livdywan 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-07-26
Due date:
% Done:

0%

Estimated time:

Description

Observation

From Grafana/ osd-admins@suse.de

Values
B0=19.585438379 
Labels
alertname     HTTP Response alert
grafana_folder     Salt
rule_uid     tm0h5mf4k

see https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=78&orgId=1&from=1690139276867&to=1690449757191

Acceptance criteria

  • AC1: No more too strict alerts for http responses are observed

Steps to reproduce

  • Bump the sensitivity of the alert
  • Investigate what if any underlying problem

Suggestions

  • Do not come up with the conclusion that OSD is overloaded sometimes. We already know that! That's what our alerts need to account for

Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #133325: osd http response alerts - bump threshold further upRejectedokurz2023-07-25

Actions
Copied to openQA Infrastructure - action #154426: HTTP Response alert Salt alerting and autoresolving shortly size:MResolvedjbaier_cz

Actions
Actions #1

Updated by okurz 9 months ago

  • Tags set to alert, osd, grafana, http response, infra
  • Description updated (diff)
  • Target version set to Ready
  • Parent task set to #108209
Actions #2

Updated by livdywan 9 months ago

Active for another few minutes this morning

Actions #3

Updated by okurz 9 months ago

  • Subject changed from HTTP Response alert Salt alerting and autoresolving shortly to HTTP Response alert Salt alerting and autoresolving shortly size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #4

Updated by okurz 9 months ago

  • Related to action #133325: osd http response alerts - bump threshold further up added
Actions #5

Updated by okurz 9 months ago

  • Priority changed from High to Immediate
Actions #6

Updated by mkittler 9 months ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #7

Updated by mkittler 9 months ago

  • Status changed from In Progress to Feedback
Actions #9

Updated by okurz 9 months ago

  • Priority changed from Immediate to Urgent

thx, with your change the alert should be a bit more forgiving.

Actions #10

Updated by okurz 9 months ago

  • Status changed from Feedback to Resolved

We checked responsiveness and OSD feels snappy today so not a persisting new problem. Also we have not received related alerts until today so we are good.

Actions #11

Updated by jbaier_cz 3 months ago

  • Copied to action #154426: HTTP Response alert Salt alerting and autoresolving shortly size:M added
Actions

Also available in: Atom PDF