Project

General

Profile

Actions

action #158125

closed

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

coordination #158110: [epic] Prevent worker overload

typing issue on ppc64 worker - only pick up (or start) new jobs if CPU load is below configured threshold size:M

Added by okurz about 1 month ago. Updated 19 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

In #158104 we observed typing issues due to mania being overloaded. mania was configured to run 30 openQA worker instances and that was mostly fine as proven in #139271-24. The recent overload was likely triggered by enabling video again as part of #157636. I already reduced the number of worker instances. But this has the drawback that again the long test backlog takes longer to be finished. We should be more flexible in using available ressource. Here I suggest to implement a check in the worker to only pick up new jobs if CPU load is below a configured threshold.

Acceptance criteria

  • AC1: An openQA worker does not start an openQA job if the CPU load is higher than configured threshold
  • AC2: By default the worker still picks up jobs if the load is not too high

Suggestions

Out of scope

  • Consider the existing grafana monitoring for "broken workers" if we use that feature of declaring as "broken" due to too high CPU load

Related issues 3 (2 open1 closed)

Copied from openQA Infrastructure - action #158104: typing issue on ppc64 worker size:SResolvedokurz2024-03-27

Actions
Copied to openQA Infrastructure - action #158709: typing issue on ppc64 worker - with automatic CPU load based limiting in place let's increase the instances on mania againNew

Actions
Copied to openQA Project - action #158910: typing issue on ppc64 worker - reconsider number of worker instances in particular on ppc64le kvm tests size:MFeedbackokurz2024-06-07

Actions
Actions

Also available in: Atom PDF