Project

General

Profile

Actions

action #162605

closed

openQA Project - coordination #112862: [saga][epic] Future ideas for easy multi-machine handling: MM-tests as first-class citizens

openQA Project - coordination #111929: [epic] Stable multi-machine tests covering multiple physical workers

[FIRING:1] CPU load alert, should be "system load"

Added by okurz 27 days ago. Updated 15 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2024-06-20
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://stats.openqa-monitor.qa.suse.de/d/WDworker40/worker-dashboard-worker40?orgId=1&editPanel=54694&tab=alert says "CPU load alert", should say "system load"

Acceptance criteria

Suggestions


Related issues 1 (1 open0 closed)

Copied from openQA Infrastructure - action #162602: [FIRING:1] worker40 (worker40: CPU load alert openQA worker40 salt cpu_load_alert_worker40 worker) size:SBlockedokurz2024-06-20

Actions
Actions #1

Updated by okurz 27 days ago

  • Copied from action #162602: [FIRING:1] worker40 (worker40: CPU load alert openQA worker40 salt cpu_load_alert_worker40 worker) size:S added
Actions #2

Updated by okurz 26 days ago

  • Due date set to 2024-07-05
  • Status changed from New to Feedback
  • Assignee set to okurz
  • Target version changed from Tools - Next to Ready
Actions #4

Updated by okurz 15 days ago

  • Due date deleted (2024-07-05)
  • Status changed from Feedback to Resolved

https://monitor.qa.suse.de/alerting/list?search=load%20alert shows the old "CPU load alert" as well as the new "system load alert" definitions.
Now following https://gitlab.suse.de/openqa/salt-states-openqa#removing-stale-provisioned-alerts

On monitor.qe.nue2.suse.org

root@monitor:/etc/grafana/provisioning/alerting # grep -Ri 'cpu.*load' *
# sudo -u grafana sqlite3 /var/lib/grafana/grafana.db "   select uid from alert_rule where uid regexp 'cpu_load_alert.*';"
cpu_load_alert_diesel
cpu_load_alert_grenache-1
cpu_load_alert_imagetester
cpu_load_alert_mania
cpu_load_alert_openqaworker-arm-1
cpu_load_alert_openqaworker1
cpu_load_alert_openqaworker14
cpu_load_alert_openqaworker16
cpu_load_alert_openqaworker17
cpu_load_alert_openqaworker18
cpu_load_alert_petrol
cpu_load_alert_qesapworker-prg4
cpu_load_alert_qesapworker-prg5
cpu_load_alert_qesapworker-prg6
cpu_load_alert_qesapworker-prg7
cpu_load_alert_s390zl12
cpu_load_alert_sapworker1
cpu_load_alert_sapworker2
cpu_load_alert_sapworker3
cpu_load_alert_worker-arm1
cpu_load_alert_worker-arm2
cpu_load_alert_worker29
cpu_load_alert_worker30
cpu_load_alert_worker31
cpu_load_alert_worker32
cpu_load_alert_worker33
cpu_load_alert_worker34
cpu_load_alert_worker35
cpu_load_alert_worker36
cpu_load_alert_worker37
cpu_load_alert_worker38
cpu_load_alert_worker39
cpu_load_alert_worker40
systemctl stop grafana-server; sudo -u grafana sqlite3 /var/lib/grafana/grafana.db " 
  delete from alert_rule where uid regexp 'cpu_load_alert.*';                                                                         
  delete from alert_rule_version where rule_uid regexp 'cpu_load_alert.*';
  delete from provenance_type where record_key regexp 'cpu_load_alert.*';"; systemctl start grafana-server

Confirmed that now in grafana CPU load alerts are gone but system alerts are there.

Actions

Also available in: Atom PDF