Actions
coordination #158374
openopenQA Project (public) - coordination #155485: [saga][epic] Efficient openQA worker pool resource handling in datacenters
[epic] Prevention of inefficient hardware resource use
Start date:
2022-07-25
Due date:
% Done:
28%
Estimated time:
(Total: 0.00 h)
Description
Ideas¶
- 1. Detect from monitoring data which monitored machines show a too low CPU usage over time
- 1.1. Just detection -> #158377
- 1.2. Switch off the machines automatically identified in 1.1. -> #158380
- 2. Identify "unused" but still powered machines
- 2.1. Crosscheck which machines marked as "unused" in racktables are still pingable (as they should not be powered on at all) -> #158383
- 2.2. Crosscheck which machines marked as "unused" in racktables still draw power according to ePDU data (as they should not be powered on and wasting significant power at all) -> #158386
- 3. Identify underused machines from too low networking data, e.g. from switch bandwidth measurements -> #114622
Updated by okurz 9 months ago
- Related to action #133700: Network bandwidth graphs per switch, like https://mrtg.suse.de/qanet13nue, for all current top-of-rack switches (TORs) that we are connected to size:M added
Updated by ggardet_arm 9 months ago
Related script available: https://github.com/os-autoinst/scripts/pull/188
Actions