Project

General

Profile

coordination #158110

Updated by okurz 8 months ago

## Motivation 
 For long overloaded openQA workers can cause "typing issues" and other random issues causing annoying sporadic test failures unless openQA worker machines are carefully configured to not be overloaded which is normally done with running just a limited number of worker instances accounting for cases when all worker instances work on rather resource heavy jobs. As a consequence in most cases openQA hardware can be severly underused. To make more efficient use of hardware resources while keeping openQA jobs as stable as possible openQA must ensure itself that resources are not exhausted. 

 ## Ideas 
 * Only pick up (or start) new jobs if CPU load is below configured threshold -> #158125 
 * Overload openQA systems on purpose to find out which system parameters are critical and define according feature requests for each relevant system parameter to be automatically handled by openQA, e.g. check CPU load, available memory, I/O rate, storage space, etc. 
  * before starting jobs 
  * while running jobs 
  * after jobs failed

Back