Project

General

Profile

Actions

action #97139

closed

[alert] multiple unhandled alerts about "malbec: Memory usage alert" size:M

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2021-08-18
Due date:
2021-09-09
% Done:

0%

Estimated time:

Description

Observation

See http://stats.openqa-monitor.qa.suse.de/d/WDmalbec/worker-dashboard-malbec?tab=alert&viewPanel=12054&orgId=1

Acceptance criteria

  • AC1: We have understood the reason why the alert triggered
  • AC2: Actions have been applied to prevent a similar situation in the future

Suggestion

  • Check history of monitoring data
  • Check logs if there is something that we still need to fix
  • Check if it was maybe a one-off experiment by a person, talk to the person and tell them "don't do it again!"
  • Consider adapting alerting thresholds

Related issues 2 (1 open1 closed)

Related to openQA Infrastructure (public) - action #98682: jobs run powerqaworker-qam-1 fail with auto_review:"(?s)powerqaworker-qam-1.*Can't write to file (.*): No space left on device at .*":retry size:MWorkable2021-09-15

Actions
Copied from openQA Infrastructure (public) - action #97136: [alert] multiple unhandled alerts about "broken workers" size:MResolveddheidler2021-08-18

Actions
Actions

Also available in: Atom PDF