Project

General

Profile

action #107257

Updated by okurz about 2 years ago

 
 ## Observation 
 From grafana: [Alerting] Apache Response Time alert 

 The apache response time exceeded the alert threshold. * Check the load of the web UI host * Consider restarting the openQA web UI service and/or apache Also see https://progress.opensuse.org/issues/73633 
 Metric name 
    
 
	
 Value 
 Min 
    
 
	
 2565671.000 

 view alert rule: http://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?tab=alert&viewPanel=84&orgId=1 

 ## Reproducible 
 Multiple alerts since at least 2022-02-22, likely even the past days. 

 ## Suggestions 
 * okurz already restarted the apache service because it was running for longer than the time since the labs was moved. But since then we had multiple other alerts 
 * Likely the problem is not apache itself but either the network is problematic or our openQA service 
 * It seems we are smoothing over not that long time so maybe we don't have enough data due to the data outages. So we should look into #107437 first 
 * Look back how it looks after #107437 is resolved 
 * Optional: Reconsider how we alert on response times when we actually do not have that many responses 

 ## Rollback steps 
 * okurz paused the alert for https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?orgId=1&editPanel=84&tab=alert , unpause if everything is good again

Back