Project

General

Profile

action #160598

Updated by livdywan 3 months ago

## Observation 

 Could be related to #158170? Did we allow to much instances? 

 ``` 
 Summary 
 System Load too high for a longer time, see https://progress.opensuse.org/issues/150983 
 Description 
 System Load is considered too high for a longer time. Machine possibly overloaded. Especially when there are too many openQA worker instances configured openQA tests would become flaky and showing lost characters or repeated characters in VNC typing. 

 Take a look which processes make the machine busy and look for corresponding openQA tests failing due to this situation and handle accordingly, e.g. retrigger the openQA tests after mitigating the root cause. 

 See 
 https://progress.opensuse.org/issues/150983 
 for details. 
 Values 
 B=79.57516129032257    C=1  
 Labels 
 alertname            	 s390zl12: CPU load alert 
 grafana_folder            	 openQA 
 host            	 s390zl12 
 hostname            	 s390zl12 
 origin            	 salt 
 rule_uid            	 cpu_load_alert_s390zl12 
 type            	 worker 
 ``` 
 https://stats.openqa-monitor.qa.suse.de/d/WDs390zl12/worker-dashboard-s390zl12?orgId=1&from=1715990201255&to=1716033674370 

 The issue is resolved at this moment, so no rollback steps needed and normal priority for now. 

 ## Suggestions 
 * Consider reducing the worker slots 
 * Check that the alert threshold is good, or adjust it 
 * Take a look at the logs from the timeframe of the alert firing

Back