Project

General

Profile

Actions

action #157081

closed

openQA Project - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project - coordination #108209: [epic] Reduce load on OSD

OSD unresponsive or significantly slow for some minutes 2024-03-12 08:30Z

Added by okurz 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-03-12
Due date:
% Done:

0%

Estimated time:

Description

Observation

As seen in https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1710230445416&to=1710235845416

Screenshot_20240312_114038_osd_slow_2024-03-12-0830Z.png

Also reported in https://suse.slack.com/archives/C02CANHLANP/p1710232779791019 #eng-testing by Jose Gomez and in https://suse.slack.com/archives/C02CSAZLAR4/p1710233069225929 #team-lsg-qe-core by Richard Fan.

For the time period 083500Z-085100Z the HTTP response of OSD is significantly above the expected range. During that timeframe CPU usage and CPU load are very low, no significant change in memory, minion workers and most graphs.
Storage I/O requests, bytes and times show a significant reduction around 083630Z.

Network usage shows a decrease from 2-8GB/s "out" and 50-800Mb/s "in" down to near-zero especially for "out" starting 083630Z.

In the hour before the incident 073500Z-083500Z I see an increase of the "Requests per second" on the apache server which peaks out exactly on 083500Z before going down again after the unresponsiveness started.

Screenshot_20240312_115450_osd_slow_2024-03-12-0830Z_http_response_vs_apache_requests.png


Files


Related issues 2 (0 open2 closed)

Related to openQA Project - action #130636: high response times on osd - Try nginx on OSD size:SResolvedmkittler2024-05-17

Actions
Copied to openQA Infrastructure - action #157666: OSD unresponsive and then not starting any more jobs on 2024-03-21Resolvedokurz2024-03-12

Actions
Actions

Also available in: Atom PDF