Actions
action #133397
closedopenQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances
openQA Project (public) - coordination #108209: [epic] Reduce load on OSD
HTTP Response alert Salt alerting and autoresolving shortly size:M
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Target version:
Start date:
2023-07-26
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
From Grafana/ osd-admins@suse.de
Values
B0=19.585438379
Labels
alertname HTTP Response alert
grafana_folder Salt
rule_uid tm0h5mf4k
Acceptance criteria¶
- AC1: No more too strict alerts for http responses are observed
Steps to reproduce¶
- Bump the sensitivity of the alert
- Investigate what if any underlying problem
Suggestions¶
- Do not come up with the conclusion that OSD is overloaded sometimes. We already know that! That's what our alerts need to account for
Updated by okurz over 1 year ago
- Tags set to alert, osd, grafana, http response, infra
- Description updated (diff)
- Target version set to Ready
- Parent task set to #108209
Updated by okurz over 1 year ago
- Subject changed from HTTP Response alert Salt alerting and autoresolving shortly to HTTP Response alert Salt alerting and autoresolving shortly size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz over 1 year ago
- Related to action #133325: osd http response alerts - bump threshold further up added
Updated by mkittler over 1 year ago
- Status changed from Workable to In Progress
- Assignee set to mkittler
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
Updated by okurz over 1 year ago
Likely related:
https://suse.slack.com/archives/C02CGKBCGT1/p1690468821341979?thread_ts=1690468821.341979&cid=C02CGKBCGT1
openQA is slow as molasses today :snail:
Updated by okurz over 1 year ago
- Priority changed from Immediate to Urgent
thx, with your change the alert should be a bit more forgiving.
Updated by okurz over 1 year ago
- Status changed from Feedback to Resolved
We checked responsiveness and OSD feels snappy today so not a persisting new problem. Also we have not received related alerts until today so we are good.
Updated by jbaier_cz 11 months ago
- Copied to action #154426: HTTP Response alert Salt alerting and autoresolving shortly size:M added
Actions