Project

General

Profile

Actions

coordination #108209

open

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

[epic] Reduce load on OSD

Added by okurz about 2 years ago. Updated 23 days ago.

Status:
Blocked
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2023-04-01
Due date:
% Done:

83%

Estimated time:
(Total: 0.00 h)

Description

Motivation

See #107875

Ideas

  • Look into cumulative CPU usage to decide where to optimize first
  • Look up old ticket from kraih about reverse proxy for postgres -> #55262
  • Experiment with using nginx instead of apache
  • Log to remote target, e.g. apache logs, and only evaluate there
  • Use remote postgres database
  • Review other intervals in telegraf

Subtasks 18 (3 open15 closed)

openQA Infrastructure - action #128789: [alert] Apache Response Time alert size:MResolvednicksinger2023-04-01

Actions
action #129481: Try to *reduce* number of apache workers to limit concurrent requests causing high CPU usageNew

Actions
openQA Infrastructure - action #129484: high response times on osd - Move OSD workers to o3 to prevent OSD overload size:MResolvedokurz2023-05-17

Actions
action #129487: high response times on osd - Limit the number of concurrent job upload handling on webUI side. Can we use a semaphore or lock using the database? size:MRejectedokurz

Actions
action #129490: high response times on osd - Try nginx on o3 with enabled load limiting or load balancing featuresResolvedkraih

Actions
openQA Infrastructure - action #129493: high response times on osd - better nice level for velociraptorResolvedokurz

Actions
action #129619: high response times on osd - simple limit of jobs running concurrently in openQA size:MResolvedtinita2023-05-20

Actions
action #129745: Enable apache response time alert and apache log alert again after we think it's good now size:MResolvedokurz2023-05-23

Actions
action #130477: [O3]http connection to O3 repo is broken sporadically in virtualization tests, likely due to systemd dependencies on apache/nginx size:MResolvedmkittler2023-06-07

Actions
action #130636: high response times on osd - Try nginx on osd with enabled load limiting or load balancing featuresNew

Actions
action #131024: Ensure both nginx+apache are properly covered in packages+testing+documentation size:SResolveddheidler

Actions
openQA Infrastructure - action #133325: osd http response alerts - bump threshold further upRejectedokurz2023-07-25

Actions
openQA Infrastructure - action #133397: HTTP Response alert Salt alerting and autoresolving shortly size:MResolvedmkittler2023-07-26

Actions
action #134114: Ensure to call OpenQA::Setup::read_config in unit testsResolvedtinita

Actions
openQA Infrastructure - action #157081: OSD unresponsive or significantly slow for some minutes 2024-03-12 08:30ZResolvedokurz2024-03-12

Actions
openQA Infrastructure - action #157666: OSD unresponsive and then not starting any more jobs on 2024-03-21Resolvedokurz2024-03-12

Actions
openQA Infrastructure - action #157726: osd-deployment | Failed pipeline for master (worker3[6-9].oqa.prg2.suse.org)Blockedokurz2024-03-18

Actions
openQA Infrastructure - action #158059: OSD unresponsive or significantly slow for some minutes 2024-03-26 13:34ZResolvedokurz

Actions

Related issues 3 (1 open2 closed)

Related to openQA Project - coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alertResolvedokurz2023-09-07

Actions
Copied from openQA Infrastructure - action #107875: [alert][osd] Apache Response Time alert size:MResolvedtinita2022-03-042022-03-24

Actions
Copied to openQA Project - coordination #158167: [epic] Increase worker capacityNewokurz2024-03-27

Actions
Actions #1

Updated by okurz about 2 years ago

  • Copied from action #107875: [alert][osd] Apache Response Time alert size:M added
Actions #2

Updated by okurz about 2 years ago

  • Description updated (diff)
Actions #3

Updated by okurz 11 months ago

  • Tracker changed from action to coordination
  • Project changed from openQA Infrastructure to openQA Project
  • Subject changed from Reduce load on OSD to [epic] Reduce load on OSD
  • Category set to Feature requests
Actions #4

Updated by okurz 11 months ago

  • Parent task set to #110833
Actions #5

Updated by okurz 11 months ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Target version changed from future to Ready
Actions #6

Updated by okurz 7 months ago

  • Related to coordination #135122: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert added
Actions #7

Updated by okurz 7 months ago

  • Target version changed from Ready to Tools - Next
Actions #8

Updated by okurz 5 months ago

  • Target version changed from Tools - Next to future
Actions #9

Updated by okurz about 1 month ago

  • Subtask #157081 added
Actions #10

Updated by okurz 29 days ago

  • Subtask #157666 added
Actions #11

Updated by okurz 28 days ago

  • Subtask #157726 added
Actions #12

Updated by okurz 24 days ago

  • Subtask #158059 added
Actions #13

Updated by okurz 23 days ago

Actions

Also available in: Atom PDF