Project

General

Profile

Actions

action #107257

closed

[alert][osd] Apache Response Time alert size:M

Added by okurz almost 3 years ago. Updated almost 3 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
-
Start date:
2022-02-22
Due date:
% Done:

0%

Estimated time:

Description

Observation

From grafana: [Alerting] Apache Response Time alert

The apache response time exceeded the alert threshold. * Check the load of the web UI host * Consider restarting the openQA web UI service and/or apache Also see https://progress.opensuse.org/issues/73633
Metric name

Value
Min

2565671.000

view alert rule: http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=84&orgId=1

Reproducible

Multiple alerts since at least 2022-02-22, likely even the past days.

Suggestions

  • okurz already restarted the apache service because it was running for longer than the time since the labs was moved. But since then we had multiple other alerts
  • Likely the problem is not apache itself but either the network is problematic or our openQA service
  • It seems we are smoothing over not that long time so maybe we don't have enough data due to the data outages. So we should look into #107437 first
  • Look back how it looks after #107437 is resolved
  • Optional: Reconsider how we alert on response times when we actually do not have that many responses

Rollback steps


Related issues 3 (0 open3 closed)

Related to openQA Infrastructure (public) - action #107437: [alert] Recurring "no data" alerts with only few minutes of outages since SUSE Nbg QA labs move size:MResolvedokurz2022-02-23

Actions
Related to openQA Infrastructure (public) - action #102650: Organize labs move to new building and SRV2 size:MResolvednicksinger2021-11-182022-05-27

Actions
Related to openQA Infrastructure (public) - action #107875: [alert][osd] Apache Response Time alert size:MResolvedtinita2022-03-04

Actions
Actions

Also available in: Atom PDF