action #132998
Updated by livdywan 7 months ago
## Observation
https://stats.openqa-monitor.qa.suse.de/d/WDopenqaworker-arm-3/worker-dashboard-openqaworker-arm-3?orgId=1&viewPanel=12054&from=1689743130960&to=1689746327640 and according email.
The graph shows that the system exhausted all available memory.
## Acceptance criteria
* **AC1:** Measures have been applied to prevent memory exhaustion
* **AC2**: It's safe to schedule jobs with too high memory requirements
## Acceptance Tests
* **AT1-1**: A job with QEMURAM=999999999 aborts cleanly without alerts being raised
* **AT1-2**: A worker without the mitigation kills processes due to memory exhaustion
## Suggestions
* Look into logs and according openQA jobs running on that host what exhausted the memory, likely too many too big openQA jobs
* Ask people to not do that!
* As necessary adapt number of worker instances or different worker classes like "big mem"
* As necessary adapt job scenarios to not overcommit
* If it is not openQA jobs then look into what else it is
## Out of scope
* Preventing the over-commit in openQA worker, see #133511 for this