action #158556
closedopenQA Project (public) - coordination #127031: [saga][epic] openQA for SUSE customers
openQA Project (public) - coordination #152955: [epic] Metric-driven project management in SUSE QE Tools team
Single-value SLI of OSD HTTP response code successful vs. all size:S
0%
Description
Motivation¶
See https://progress.opensuse.org/issues/158550
Acceptance criteria¶
- AC1: https://monitor.qa.suse.de/ shows single-value OSD response code based "availability" as percentage
- AC2: An according alert exists
Suggestions¶
- Understand which status-codes we care about
- Calculate a ratio between them, nr(!5xx)/nr(all)?
- Put a single value panel on monitor.qa.suse.de, similar to already existing availability
- Define a threshold to visualize based on current data, e.g. if we have 98% then mark everything below 95% as red
- Define an according alert, same or similar to visual threshold
- Out-of-scope: traffic between workers or OSD<->workers
Updated by okurz 10 months ago
- Copied to action #158559: Single-value SLI of OSD HTTP response time size:S added
Updated by nicksinger 10 months ago
- Subject changed from Single-value SLI of OSD HTTP response code successful vs. all to Single-value SLI of OSD HTTP response code successful vs. all size:S
- Description updated (diff)
Updated by okurz 10 months ago
- Due date set to 2024-04-23
- Status changed from Workable to In Progress
- Assignee set to okurz
When originally defining the "availability" in https://gitlab.suse.de/openqa/salt-states-openqa commit fc3b5d8 I already defined it quite correctly as "response_openqa_200/response_openqa" not what I thought about just "any http response" vs. "no response". Maybe we should just update the panel to replace the deprecated angular based "singlestat math", update the description, potentially update the calculation to calculate "!500/all".
Updated by okurz 10 months ago ยท Edited
merged. Calculation was wrong, fixed with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1147 (merged)
Updated by okurz 10 months ago
- Copied to action #158808: Prevent HTTP response codes 500 as observed in OSD monitoring size:M added