Actions
action #158377
openopenQA Project (public) - coordination #155485: [saga][epic] Efficient openQA worker pool resource handling in datacenters
coordination #158374: [epic] Prevention of inefficient hardware resource use
Detect from monitoring data which monitored machines show a too low system usage over time size:M
Status:
Workable
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
QA (public, currently private due to #173521) - Tools - Next
Start date:
2024-04-01
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Motivation¶
For machines which we do monitor we can also look into effectively "unused" machines.
Acceptance criteria¶
- AC1: Alert for physical machines with too low system usage over longer time
Suggestions¶
- Experiment with an alert for multiple machines
- Reference could be https://monitor.qa.suse.de/d/WDopenqaworker1/worker-dashboard-openqaworker1 which is showing not too high load but certainly enough load from time to time
- Virtual machines like tumblesle https://monitor.qa.suse.de/d/GDtumblesle/dashboard-for-tumblesle?orgId=1&refresh=1m&viewPanel=54694&from=now-7d&to=now can have a very low load but are not a problem so do not include those
- As necessary adjust the telegraf config to be able to distinguish between physical and virtual machines. Simple shortcut could be to just select machines with more than N cpus, like 2 or 4
Updated by okurz 8 months ago
- Copied to action #158380: Detect and switch off from monitoring data which monitored machines show a too low CPU usage over time added
Actions