Project

General

Profile

Actions

action #182681

open

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

coordination #178243: [epic] More efficient handling of big job schedules, not executable jobs, never matching worker classes, etc.

Dynamic openQA worker(s) spinoff during high load

Added by gpathak 5 days ago. Updated 5 days ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

It's been observed that sometimes the worker slots become unavailable with a message something similar:

Unavailable: The average load (28.34 27.85 21.15) is exceeding the configured threshold of 25. The worker will temporarily not accept new jobs until the load is lower again.

This unavailability of slots can cause other tests to just keep waiting in the scheduler queue for day(s) further delaying overall deliverable.

We should come up with some solution such that a temporary and exact copy of the unavailable worker slot get created on a separate machine and gets registered to openQA-webUI automatically and then later should be deleted and unregistered from webUI.

Suggestions

  • A spare baremetal machine with no openQA setup should be connected to openQA maybe via IPMI or via HMC (for PPC workers)
  • The baremetal machine should be restored to its original state after teardown of openQA workers when the test finishes
Actions #1

Updated by okurz 5 days ago

  • Target version set to future
  • Parent task set to #178243
Actions

Also available in: Atom PDF