Project

General

Profile

coordination #108209

coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

[epic] Reduce load on OSD

Added by okurz about 1 year ago. Updated 1 day ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2023-04-01
Due date:
2023-06-14
% Done:

25%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

Motivation

See #107875

Ideas

  • Look into cumulative CPU usage to decide where to optimize first
  • Look up old ticket from kraih about reverse proxy for postgres -> #55262
  • Experiment with using nginx instead of apache
  • Log to remote target, e.g. apache logs, and only evaluate there
  • Use remote postgres database
  • Review other intervals in telegraf

Subtasks

openQA Infrastructure - action #128789: [alert] Apache Response Time alert size:MResolvednicksinger

action #129481: Try to *reduce* number of apache workers to limit concurrent requests causing high CPU usageBlocked

openQA Infrastructure - action #129484: high response times on osd - Move OSD workers to o3 to prevent OSD overload size:MWorkable

action #129487: high response times on osd - Limit the number of concurrent job upload handling on webUI side. Can we use a semaphore or lock using the database? size:MWorkablekraih

action #129490: high response times on osd - Try nginx with enabled load limiting or load balancing featuresNew

openQA Infrastructure - action #129493: high response times on osd - better nice level for velociraptorResolvedokurz

action #129619: high response times on osd - simple limit of jobs running concurrently size:MIn Progresskraih

action #129745: Enable apache response time alert again after we think it's good nowBlockedokurz


Related issues

Copied from openQA Infrastructure - action #107875: [alert][osd] Apache Response Time alert size:MResolved2022-03-042022-03-24

History

#1 Updated by okurz about 1 year ago

  • Copied from action #107875: [alert][osd] Apache Response Time alert size:M added

#2 Updated by okurz about 1 year ago

  • Description updated (diff)

#3 Updated by okurz 15 days ago

  • Tracker changed from action to coordination
  • Project changed from openQA Infrastructure to openQA Project
  • Subject changed from Reduce load on OSD to [epic] Reduce load on OSD
  • Category set to Feature requests

#4 Updated by okurz 15 days ago

  • Parent task set to #110833

Also available in: Atom PDF