Project

General

Profile

Actions

action #133511

closed

[spike solution][timeboxed:10h] Prevent memory over-commits in openQA worker service definitions size:S

Added by okurz 10 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-07-19
Due date:
% Done:

0%

Estimated time:

Description

Motivation

See what happened in #132998. Assuming the memory exhaustion is caused by over-commiting in job settings, i.e. qemu memory applied to machines, we can prevent memory over-commits with some fancy cgroup settings in the openQA worker systemd units depending on available memory and a fair division for all instances.

Suggestions

  • Research https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html how to limit memory with cgroup settings in systemd unit files
  • Figure out a way how to find a good limit value depending on number of openQA worker instances, reserved memory for other services, etc.
  • Draft pull request or doc change or whatever with some example implementation on one of our workers to demonstrate the concept

Related issues 2 (2 open0 closed)

Copied from openQA Infrastructure - action #132998: [alert] [FIRING:1] openqaworker-arm-3: Memory usage alert openQA (openqaworker-arm-3 memory_usage_alert_openqaworker-arm-3 worker) size:MWorkable2023-07-19

Actions
Copied to openQA Infrastructure - action #150986: [timeboxed:10h] Prevent cpu over-allocation in openQA worker service definitionsNew2023-07-19

Actions
Actions

Also available in: Atom PDF