Project

General

Profile

action #168718

Updated by livdywan about 2 months ago

## Observation 
 The "response codes" panel takes a considerable time to load or even runs into timeouts: 
 https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1 

 ## Acceptance criteria 
 * **AC1:** https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1 feels snappy 

 ## Suggestions 
 * Look into what makes the query slow, could be (too) big database measurements in influxdb needing tailoring or telegraf already pushing too much data 
   * Adjust the interval used to push new data 
 * The query also sometimes times out completely, resulting in *no data* 
   * We checked if this could be something like multiple requests on different machines but couldn't confirm that 
   * Grafana might still be running operations that already timed out? 
   * Even a small range like 24h is likely to hit the issue 
 * Slowness also affects other panels 
 * Monitor resource usage on the VM and hypervisor host 
   * Make sure we have enough resources for Grafana/InfluxDB 
   * Trim data we have in InfluxDB somehow to make the amount of data more manageable 
 * Look into limiting the retention of data e.g. up to 1 year only

Back