Project

General

Profile

action #158377

Updated by okurz about 1 month ago

https://progress.opensuse.org/issues/158377 
 Detect from monitoring data which monitored machines show a too low CPU usage over time size:M 

 ## Motivation 
 For machines which we *do* monitor we can also look into effectively "unused" machines. 

 ## Acceptance criteria 
 * **AC1:** Alert for physical machines with too low system usage over longer time 

 ## Suggestions 
 * Experiment with an alert for multiple machines 
 * Reference could be https://monitor.qa.suse.de/d/WDopenqaworker1/worker-dashboard-openqaworker1 which is showing not too high load but certainly enough load from time to time 
 * Virtual machines like tumblesle https://monitor.qa.suse.de/d/GDtumblesle/dashboard-for-tumblesle?orgId=1&refresh=1m&viewPanel=54694&from=now-7d&to=now can have a very low load but are not a problem so do not include those 
 * As necessary adjust the telegraf config to be able to distinguish between physical and virtual machines. Simple shortcut could be to just select machines with more than N cpus, like 2 or 4

Back