Project

General

Profile

action #125468

Updated by okurz almost 2 years ago

## Observation 
 From email: 2023-03-05 2331 

 ``` 
 *Firing: 1 alert * 
 Firing 
 _*Apache Response Time alert *_ 
 *Value:* [ var='A0' metric='Min' labels={} value=1.111601e+06 ] 
 *message:* The apache response time exceeded the alert threshold. * Check the load of the web UI host * Consider restarting the openQA web UI service and/or apache Also see https://progress.opensuse.org/issues/73633 
 *Labels:* 
 * alertname: Apache Response Time alert 
 * rule_uid: J5M8aX04z 
 [2]* Silence *[3][4]* Go to Dashboard *[5][4]* Go to Panel [6]Source[7]* 
 ``` 

 see https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=84&from=1678050000000&to=1678057199000 

 ## Acceptance criteria 
 * **AC1:** Apache response time consistently stable over some days 
 * **AC2:** OSD has been checked for possible causes of the alert firing during the original alert reporting period 

 ## Suggestions 
 * Check https://monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&viewPanel=84&from=1678050000000&to=1678057199000 and look into other panels for the same time and system logs 
 * If no significant problem was found on OSD itself compare with monitoring data for multiple other hosts, maybe something with the network at the time? 
 * Act accordingly to make the issue less likely to reappear

Back