Actions
action #125468
closed[alert] [FIRING:1] (Apache Response Time alert J5M8aX04z) then resolved itself so flaky? size:M
Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2023-03-06
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
From email: 2023-03-05 2331
*Firing: 1 alert *
Firing
_*Apache Response Time alert *_
*Value:* [ var='A0' metric='Min' labels={} value=1.111601e+06 ]
*message:* The apache response time exceeded the alert threshold. * Check the load of the web UI host * Consider restarting the openQA web UI service and/or apache Also see https://progress.opensuse.org/issues/73633
*Labels:*
* alertname: Apache Response Time alert
* rule_uid: J5M8aX04z
[2]* Silence *[3][4]* Go to Dashboard *[5][4]* Go to Panel [6]Source[7]*
Acceptance criteria¶
- AC1: Apache response time consistently stable over some days
- AC2: OSD has been checked for possible causes of the alert firing during the original alert reporting period
Suggestions¶
- Check https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=84&from=1678050000000&to=1678057199000 and look into other panels for the same time and system logs
- If no significant problem was found on OSD itself compare with monitoring data for multiple other hosts, maybe something with the network at the time?
- Act accordingly to make the issue less likely to reappear
Actions