Project

General

Profile

Actions

action #99654

closed

coordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes

coordination #99579: [epic][retro] Follow-up to "Published QCOW images appear to be uncompressed"

Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts size:S

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2021-10-01
Due date:
% Done:

0%

Estimated time:

Description

Motivation

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 was removing an alert level of 8s of disk I/O time but in #99579 we learned that alerts at the time were legitimate and we need to make alerts more strict again. Likely we should revert the decision but ensure that we have proper, well-described alert levels, e.g. 8s for 5m, 100s for 1m or similar

Suggestions

  • Look at current alert developments
  • Provide an MR with good descriptions and adjusted values
Actions #1

Updated by livdywan over 2 years ago

  • Subject changed from Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts to Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #2

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
Actions #3

Updated by mkittler over 2 years ago

  • Status changed from Workable to In Progress

Before my SR there were actually two alert evaluators, one for 8 seconds and one for 100 seconds and the condition was made so that the 100 seconds were basically overruled by the 8 seconds and hence not effective. To me this looked like a mistake so I removed the 8 seconds and made the 100 seconds effective.

Note that this is only about write I/O time. The threshold for read I/O time is and was at 8 seconds (without further ineffective values).


Judging by the history 8 seconds for writes is a bit too low. 10 seconds are sometimes slightly exceeded but considering that we're looking at the average over 5 minutes it would be ok to set 10 seconds. Maybe that was the intention when someone set 100 seconds before. It was only a zero too much and the setting wasn't effective. So I'll set it to 10 seconds. SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/605

Actions #4

Updated by openqa_review over 2 years ago

  • Due date set to 2021-10-28

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Resolved

The follow-up worked.

Actions #7

Updated by okurz over 2 years ago

  • Due date deleted (2021-10-28)
Actions

Also available in: Atom PDF