Project

General

Profile

Actions

action #132998

open

[alert] [FIRING:1] openqaworker-arm-3: Memory usage alert openQA (openqaworker-arm-3 memory_usage_alert_openqaworker-arm-3 worker) size:M

Added by okurz 11 months ago. Updated 8 months ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Start date:
2023-07-19
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&viewPanel=12054&from=1689743130960&to=1689746327640 and according email.
The graph shows that the system exhausted all available memory.

Acceptance criteria

  • AC1: Measures have been applied to prevent memory exhaustion
  • AC2: It's safe to schedule jobs with too high memory requirements

Acceptance Tests

  • AT1-1: A job with QEMURAM=999999999 aborts cleanly without alerts being raised
  • AT1-2: A worker without the mitigation kills processes due to memory exhaustion

Suggestions

  • Look into logs and according openQA jobs running on that host what exhausted the memory, likely too many too big openQA jobs
  • Ask people to not do that!
  • As necessary adapt number of worker instances or different worker classes like "big mem"
  • As necessary adapt job scenarios to not overcommit
  • If it is not openQA jobs then look into what else it is

Out of scope

  • Preventing the over-commit in openQA worker, see #133511 for this

Related issues 1 (0 open1 closed)

Copied to openQA Infrastructure - action #133511: [spike solution][timeboxed:10h] Prevent memory over-commits in openQA worker service definitions size:SResolvedjbaier_cz2023-07-19

Actions
Actions #1

Updated by okurz 11 months ago

  • Copied to action #133511: [spike solution][timeboxed:10h] Prevent memory over-commits in openQA worker service definitions size:S added
Actions #2

Updated by okurz 11 months ago

  • Subject changed from [alert] [FIRING:1] openqaworker-arm-3: Memory usage alert openQA (openqaworker-arm-3 memory_usage_alert_openqaworker-arm-3 worker) to [alert] [FIRING:1] openqaworker-arm-3: Memory usage alert openQA (openqaworker-arm-3 memory_usage_alert_openqaworker-arm-3 worker) size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by okurz 10 months ago

  • Target version changed from Ready to future
Actions #4

Updated by livdywan 8 months ago

  • Description updated (diff)
Actions

Also available in: Atom PDF