Project

General

Profile

Actions

action #132998

open

[alert] [FIRING:1] openqaworker-arm-3: Memory usage alert openQA (openqaworker-arm-3 memory_usage_alert_openqaworker-arm-3 worker) size:M

Added by okurz 10 months ago. Updated 7 months ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2023-07-19
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&viewPanel=12054&from=1689743130960&to=1689746327640 and according email.
The graph shows that the system exhausted all available memory.

Acceptance criteria

  • AC1: Measures have been applied to prevent memory exhaustion
  • AC2: It's safe to schedule jobs with too high memory requirements

Acceptance Tests

  • AT1-1: A job with QEMURAM=999999999 aborts cleanly without alerts being raised
  • AT1-2: A worker without the mitigation kills processes due to memory exhaustion

Suggestions

  • Look into logs and according openQA jobs running on that host what exhausted the memory, likely too many too big openQA jobs
  • Ask people to not do that!
  • As necessary adapt number of worker instances or different worker classes like "big mem"
  • As necessary adapt job scenarios to not overcommit
  • If it is not openQA jobs then look into what else it is

Out of scope

  • Preventing the over-commit in openQA worker, see #133511 for this

Related issues 1 (0 open1 closed)

Copied to openQA Infrastructure - action #133511: [spike solution][timeboxed:10h] Prevent memory over-commits in openQA worker service definitions size:SResolvedjbaier_cz2023-07-19

Actions
Actions

Also available in: Atom PDF