Project

General

Profile

action #99654

coordination #64746: [saga][epic] Scale up: Efficient handling of large storage to be able to run current tests efficiently but keep big archives of old results

coordination #99579: [epic][retro] Follow-up to "Published QCOW images appear to be uncompressed"

Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts size:S

Added by okurz 2 months ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2021-10-01
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 was removing an alert level of 8s of disk I/O time but in #99579 we learned that alerts at the time were legitimate and we need to make alerts more strict again. Likely we should revert the decision but ensure that we have proper, well-described alert levels, e.g. 8s for 5m, 100s for 1m or similar

Suggestions

  • Look at current alert developments
  • Provide an MR with good descriptions and adjusted values

History

#1 Updated by cdywan about 2 months ago

  • Subject changed from Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts to Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts size:S
  • Description updated (diff)
  • Status changed from New to Workable

#2 Updated by mkittler about 2 months ago

  • Assignee set to mkittler

#3 Updated by mkittler about 2 months ago

  • Status changed from Workable to In Progress

Before my SR there were actually two alert evaluators, one for 8 seconds and one for 100 seconds and the condition was made so that the 100 seconds were basically overruled by the 8 seconds and hence not effective. To me this looked like a mistake so I removed the 8 seconds and made the 100 seconds effective.

Note that this is only about write I/O time. The threshold for read I/O time is and was at 8 seconds (without further ineffective values).


Judging by the history 8 seconds for writes is a bit too low. 10 seconds are sometimes slightly exceeded but considering that we're looking at the average over 5 minutes it would be ok to set 10 seconds. Maybe that was the intention when someone set 100 seconds before. It was only a zero too much and the setting wasn't effective. So I'll set it to 10 seconds. SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/605

#4 Updated by openqa_review about 2 months ago

  • Due date set to 2021-10-28

Setting due date based on mean cycle time of SUSE QE Tools

#6 Updated by mkittler about 2 months ago

  • Status changed from In Progress to Resolved

The follow-up worked.

#7 Updated by okurz about 1 month ago

  • Due date deleted (2021-10-28)

Also available in: Atom PDF