Project

General

Profile

Actions

action #158556

closed

openQA Project (public) - coordination #127031: [saga][epic] openQA for SUSE customers

openQA Project (public) - coordination #152955: [epic] Metric-driven project management in SUSE QE Tools team

Single-value SLI of OSD HTTP response code successful vs. all size:S

Added by okurz 10 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Start date:
2024-04-07
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

See https://progress.opensuse.org/issues/158550

Acceptance criteria

  • AC1: https://monitor.qa.suse.de/ shows single-value OSD response code based "availability" as percentage
  • AC2: An according alert exists

Suggestions

  • Understand which status-codes we care about
    • Calculate a ratio between them, nr(!5xx)/nr(all)?
  • Put a single value panel on monitor.qa.suse.de, similar to already existing availability
  • Define a threshold to visualize based on current data, e.g. if we have 98% then mark everything below 95% as red
  • Define an according alert, same or similar to visual threshold
  • Out-of-scope: traffic between workers or OSD<->workers

Related issues 2 (0 open2 closed)

Copied to openQA Infrastructure (public) - action #158559: Single-value SLI of OSD HTTP response time size:SResolvedokurz2024-04-07

Actions
Copied to openQA Project (public) - action #158808: Prevent HTTP response codes 500 as observed in OSD monitoring size:MResolvedmkittler

Actions
Actions #1

Updated by okurz 10 months ago

  • Copied to action #158559: Single-value SLI of OSD HTTP response time size:S added
Actions #2

Updated by nicksinger 10 months ago

  • Subject changed from Single-value SLI of OSD HTTP response code successful vs. all to Single-value SLI of OSD HTTP response code successful vs. all size:S
  • Description updated (diff)
Actions #3

Updated by nicksinger 10 months ago

  • Status changed from New to Workable
Actions #4

Updated by okurz 10 months ago

  • Parent task changed from #158550 to #152955
Actions #5

Updated by okurz 10 months ago

  • Due date set to 2024-04-23
  • Status changed from Workable to In Progress
  • Assignee set to okurz

When originally defining the "availability" in https://gitlab.suse.de/openqa/salt-states-openqa commit fc3b5d8 I already defined it quite correctly as "response_openqa_200/response_openqa" not what I thought about just "any http response" vs. "no response". Maybe we should just update the panel to replace the deprecated angular based "singlestat math", update the description, potentially update the calculation to calculate "!500/all".

Actions #7

Updated by okurz 10 months ago

  • Status changed from In Progress to Feedback
Actions #8

Updated by okurz 10 months ago ยท Edited

merged. Calculation was wrong, fixed with https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1147 (merged)

Actions #9

Updated by okurz 10 months ago

  • Copied to action #158808: Prevent HTTP response codes 500 as observed in OSD monitoring size:M added
Actions #10

Updated by nicksinger 9 months ago

  • Status changed from Feedback to Workable
Actions #11

Updated by okurz 9 months ago

  • Due date deleted (2024-04-23)
  • Status changed from Workable to Resolved

I actually think this is resolved

Actions

Also available in: Atom PDF