action #99654
closedcoordination #80142: [saga][epic] Scale out: Redundant/load-balancing deployments of openQA, easy containers, containers on kubernetes
coordination #99579: [epic][retro] Follow-up to "Published QCOW images appear to be uncompressed"
Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts size:S
Description
Motivation¶
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 was removing an alert level of 8s of disk I/O time but in #99579 we learned that alerts at the time were legitimate and we need to make alerts more strict again. Likely we should revert the decision but ensure that we have proper, well-described alert levels, e.g. 8s for 5m, 100s for 1m or similar
Suggestions¶
- Look at current alert developments
- Provide an MR with good descriptions and adjusted values
Updated by livdywan over 3 years ago
- Subject changed from Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts to Revisit decision in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/545 regarding I/O alerts size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler over 3 years ago
- Status changed from Workable to In Progress
Before my SR there were actually two alert evaluators, one for 8 seconds and one for 100 seconds and the condition was made so that the 100 seconds were basically overruled by the 8 seconds and hence not effective. To me this looked like a mistake so I removed the 8 seconds and made the 100 seconds effective.
Note that this is only about write I/O time. The threshold for read I/O time is and was at 8 seconds (without further ineffective values).
Judging by the history 8 seconds for writes is a bit too low. 10 seconds are sometimes slightly exceeded but considering that we're looking at the average over 5 minutes it would be ok to set 10 seconds. Maybe that was the intention when someone set 100 seconds before. It was only a zero too much and the setting wasn't effective. So I'll set it to 10 seconds. SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/605
Updated by openqa_review over 3 years ago
- Due date set to 2021-10-28
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 3 years ago
The first SR didn't work. Follow-up: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/606
Updated by mkittler over 3 years ago
- Status changed from In Progress to Resolved
The follow-up worked.