action #152386: [alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 size:S - openQA Infrastructure (public) - openSUSE Project Management Tool

Actions

Copy link

action #152386

closed

[alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 size:S

Added by mkittler about 1 year ago. Updated about 1 year ago.

Status:

Resolved

Priority:

High

Assignee:

okurz

Category:

Target version:

openQA Project (public) - Ready

Start date:

2023-12-11

Due date:

% Done:

Estimated time:

Tags:

infra

Description

Observation¶

The "Apache Workers" and "HTTP Response" alerts fired on 2023-12-11 00:48:00 +0100 CET and 2023-12-11 00:49:00 +0100 CET respectively. The alerts were resolved 5 minutes later. So the impact is likely not very high (if there was an impact at all).

Suggestions¶

Check impact (so far there doesn't seem to be any), maybe have a look at jobs that were executed around that time
Adjust the alert thresholds? We rarely seem to hit that limit, though?
Check grafana state and systemd journal from the relevant time
- https://stats.openqa-monitor.qa.suse.de/alerting/grafana/MW025mB4z/view?orgId=1
- https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=15&orgId=1&from=1702249200000&to=1702421999000
- See if a lot of jobs were scheduled that time?
Seems to have been a one-off so far

Out of scope¶

Moving to nginx

Actions

Copy link

Updated by mkittler about 1 year ago

Target version set to Ready

Actions

Copy link

Updated by okurz about 1 year ago

Priority changed from Normal to High

Actions

Copy link

Updated by livdywan about 1 year ago

Subject changed from [alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 to [alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by okurz about 1 year ago

Status changed from Workable to In Progress
Assignee set to okurz

Actions

Copy link

Updated by okurz about 1 year ago

Status changed from In Progress to Resolved

In https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=15&orgId=1&from=1701864914643&to=1702894968698 I could only see a small dip once and not again. https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=15&orgId=1&from=1702243017076&to=1702261824564 shows a detail of the incident.

In https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&from=1702243017076&to=1702261824564 I could see that there was high CPU load, low CPU usage and near-zero I/O bandwidth on all monitored storage devices. So I suspect that the external hypervisor blocked the VM which apparently does not happen often. I see no necessary action here right now.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #152386

[alert] "Apache Workers" and "HTTP Response" alerts fired shortly on 2023-12-11 size:S

Observation¶

Suggestions¶

Out of scope¶

Updated by mkittler about 1 year ago

Updated by okurz about 1 year ago

Updated by livdywan about 1 year ago

Updated by okurz about 1 year ago

Updated by okurz about 1 year ago