Project

General

Profile

Actions

action #160239

closed

openQA Project (public) - coordination #110833: [saga][epic] Scale up: openQA can handle a schedule of 100k jobs with 1k worker instances

openQA Project (public) - coordination #108209: [epic] Reduce load on OSD

[alert] External http responses Salt (https://openqa.suse.de/health) due to "Too many open files" after switch to nginx

Added by tinita 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Start date:
2024-05-12
Due date:
% Done:

0%

Estimated time:

Description

Observation

1 firing alert instance
[IMAGE]

📁 SALT › EXTERNAL HTTP RESPONSES

🔥 1 firing instances

Firing [stats.openqa-monitor.qa.suse.de]
http://stats.openqa-monitor.qa.suse.de/alerting/grafana/b3a53df8-b7ee-48dd-9325-8a541187737f/view?orgId=1
External http responses
View alert [stats.openqa-monitor.qa.suse.de]
Summary
HTTP endpoint does not properly work
Description
An HTTP endpoint we need for proper operation delivers an http status code which indicates an issue with the service or its reachability.
Values
B=500  C=1 
Labels
alertname
External http responses
grafana_folder
Salt
server
https://openqa.suse.de/health

Looking into the access og, we had 4825 500 Server errors today so far, not only for https://openqa.suse.de/health

The errorlog shows many:

2024/05/12 00:06:06 [crit] 2563#2563: accept4() failed (24: Too many open files)

The first occurrence I can find was 2024/05/07 12:02:50.

For comparison, the number of open files:

# o3
lsof | wc -l
18978
# osd
lsof | wc -l
35675

Rollback actions


Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #130636: high response times on osd - Try nginx on OSD size:SResolvedmkittler

Actions
Actions

Also available in: Atom PDF