Project

General

Profile

Actions

coordination #158374

open

openQA Project - coordination #155485: [saga][epic] Efficient openQA worker pool resource handling in datacenters

[epic] Prevention of inefficient hardware resource use

Added by okurz 4 months ago. Updated 3 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2022-07-25
Due date:
% Done:

28%

Estimated time:
(Total: 0.00 h)

Description

Ideas

  • 1. Detect from monitoring data which monitored machines show a too low CPU usage over time
  • 1.1. Just detection -> #158377
  • 1.2. Switch off the machines automatically identified in 1.1. -> #158380
  • 2. Identify "unused" but still powered machines
  • 2.1. Crosscheck which machines marked as "unused" in racktables are still pingable (as they should not be powered on at all) -> #158383
  • 2.2. Crosscheck which machines marked as "unused" in racktables still draw power according to ePDU data (as they should not be powered on and wasting significant power at all) -> #158386
  • 3. Identify underused machines from too low networking data, e.g. from switch bandwidth measurements -> #114622

Subtasks 7 (5 open2 closed)

action #114622: Identification of unused/idle machines by alarming when there is no traffic on the corresponding switch ports for some timeNew2022-07-25

Actions
action #133700: Network bandwidth graphs per switch, like https://mrtg.suse.de/qanet13nue, for all current top-of-rack switches (TORs) that we are connected to size:MBlockedokurz2023-08-02

Actions
action #158377: Detect from monitoring data which monitored machines show a too low system usage over time size:MWorkable2024-04-01

Actions
action #158380: Detect and switch off from monitoring data which monitored machines show a too low CPU usage over timeNew2024-04-01

Actions
action #158383: Crosscheck which machines marked as "unused" in racktables are still pingable (as they should not be powered on at all) size:MResolvednicksinger2024-04-01

Actions
action #158386: Crosscheck which machines marked as "unused" in racktables still draw power according to ePDU data (as they should not be powered on and wasting significant power at all) - NUE2 size:MWorkable2024-04-01

Actions
action #158907: Automated check for machines marked as "unused" in racktables but still pingable (as they should not be powered on at all) size:MResolvednicksinger

Actions
Actions #1

Updated by okurz 4 months ago

  • Description updated (diff)
Actions #2

Updated by okurz 4 months ago

  • Subtask #158377 added
Actions #3

Updated by okurz 4 months ago

  • Subtask #158380 added
Actions #4

Updated by okurz 4 months ago

  • Subtask #158383 added
Actions #5

Updated by okurz 4 months ago

  • Subtask #158386 added
Actions #6

Updated by okurz 4 months ago

  • Description updated (diff)
Actions #7

Updated by okurz 4 months ago

  • Related to action #133700: Network bandwidth graphs per switch, like https://mrtg.suse.de/qanet13nue, for all current top-of-rack switches (TORs) that we are connected to size:M added
Actions #8

Updated by okurz 4 months ago

  • Subtask #133700 added
Actions #9

Updated by okurz 4 months ago

  • Subtask #114622 added
Actions #10

Updated by ggardet_arm 4 months ago

Actions #11

Updated by okurz 3 months ago

  • Subtask #158907 added
Actions #12

Updated by okurz 3 months ago

  • Target version changed from Tools - Next to future
Actions

Also available in: Atom PDF