Project

General

Profile

Actions

action #129619

closed

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

coordination #108209: [epic] Reduce load on OSD

high response times on osd - simple limit of jobs running concurrently in openQA size:M

Added by okurz 11 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2023-05-20
Due date:
% Done:

0%

Estimated time:

Description

Motivation

OSD suffers from high response times or alerts about http responses. As it's likely due to too many jobs trying to upload concurrently we should introduce limits. Likely the easiest limit is on the number of jobs that the scheduler assigns to workers to prevent too many running in parallel

Acceptance criteria

  • AC1: openQA configuration options can limit the number of jobs that will be picked up at once
  • AC2: By default there is no limit

Suggestions

  • Look into the scheduler code, likely in lib/OpenQA/Scheduler/Model/Jobs.pm . Maybe it is possible to simply not assign any jobs to workers based on a config setting, if defined
  • Confirm in production, e.g. try it out on OSD
  • Come up with a good limit for osd

Further details

  • by default "no limit" because otherwise admins and users might be surprised if jobs are limited and they never configured something

Out of scope

  • Type of workers or type of jobs don't matter. Of course jobs with 10k job modules are more heavy but here we really focus on the number of jobs

Files


Related issues 6 (1 open5 closed)

Related to openQA Project - action #129490: high response times on osd - Try nginx on o3 with enabled load limiting or load balancing featuresResolvedkraih

Actions
Related to QA - action #130312: [tools] URL listing TW snapshots (and the changes therein), has stopped workingResolvedkraih2023-06-03

Actions
Related to openQA Infrastructure - action #134927: OSD throws 503, unresponsive for some minutes size:MResolvedokurz2023-08-31

Actions
Related to openQA Infrastructure - action #135632: "Mojo::File::spurt is deprecated in favor of Mojo::File::spew" breaking os-autoinst OBS build and osd-deployment size:MResolvedokurz2023-05-08

Actions
Blocks openQA Project - action #129481: Try to *reduce* number of apache workers to limit concurrent requests causing high CPU usageNew

Actions
Copied from openQA Project - action #129487: high response times on osd - Limit the number of concurrent job upload handling on webUI side. Can we use a semaphore or lock using the database? size:MRejectedokurz

Actions
Actions

Also available in: Atom PDF