Project

General

Profile

coordination #99579

Updated by okurz over 2 years ago

## Motivation 
 In #99246 gladly mdoucha could identify a big performance regression due to https://github.com/os-autoinst/os-autoinst/pull/1699/commits/eb207de0a372d832a60a081dd08dc674c90ef950 . After the very specific bug report we could deploy a fix to openqa.suse.de within 2 hours so very quick. But before that we had nearly two months of vague issues, user reports about reduced performance, multiple alerts related to high CPU time, high I/O pressure, long test runtimes and long test schedule queues. 

 For example: 

 * Looking at https://monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=47&orgId=1&from=1626435701542&to=1632687629034 it looks indeed like in 2021-07 the Disk I/O times were significantly lower than in 2021-08 
 * In late 2021-08 and 2021-09 there were multiple Disk I/O related alerts but no relevant followup was conducted. This for me another reminder that we should dilligently act on alerts and try really hard to understand the reasons for any failing. 

 ## Acceptance criteria 
 * **AC1:** A [Five-Whys](https://en.wikipedia.org/wiki/Five_whys) analysis has been conducted and results documented 
 * **AC2:** Improvements are planned 

 ## Suggestions 
 * Bring up in retro 
 * Conduct "Five-Whys" analysis for the topic 
 * Identify follow-up tasks in tickets 
  * e.g. increase code coverage in https://app.codecov.io/gh/os-autoinst/os-autoinst/ , especially https://app.codecov.io/gh/os-autoinst/os-autoinst/tree/master/OpenQA/Qemu

Back