Actions
action #168718
openopenQA Project (public) - coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6
The "response codes" panel takes a considerable time to load or even runs into timeouts size:M
Status:
Workable
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-10-17
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
The "response codes" panel takes a considerable time to load or even runs into timeouts:
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1
Acceptance criteria¶
- AC1: https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1 feels snappy
Suggestions¶
- Look into what makes the query slow, could be (too) big database measurements in influxdb needing tailoring or telegraf already pushing too much data
- Adjust the interval used to push new data
- The query also sometimes times out completely, resulting in no data
- We checked if this could be something like multiple requests on different machines but couldn't confirm that
- Grafana might still be running operations that already timed out?
- Even a small range like 24h is likely to hit the issue
- Slowness also affects other panels
- Monitor resource usage on the VM and hypervisor host
- Make sure we have enough resources for Grafana/InfluxDB
- Trim data we have in InfluxDB somehow to make the amount of data more manageable
- Look into limiting the retention of data e.g. up to 1 year only
Updated by livdywan 6 months ago
- Subject changed from The "response codes" panel takes a considerable time to load or even runs into timeouts to The "response codes" panel takes a considerable time to load or even runs into timeouts size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 1 month ago
- Target version changed from Tools - Next to Ready
Updated by robert.richardson 28 days ago
- Status changed from Workable to In Progress
- Assignee set to robert.richardson
Updated by robert.richardson 27 days ago
- Status changed from In Progress to Workable
I noticed that the Nginx Response Time
Panel is taking even longer than the Response
panel (about 3x).
Also i copied the different queries from the grafana webui, replacing $timeFilter
with time > now() - 24h
and $__interval
with different values, though as it is part of the `GROUP BY´ statement the impact on timing isnt really big:
ssh openqa-monitor.qa.suse.de
influx
use telegraf
> EXPLAIN ANALYZE SELECT...
Response Codes¶
Interval | Duration |
---|---|
1s | ~8.0s |
30s (default) | ~6.3s |
5m | ~5.8s |
Response Size¶
Interval | Duration |
---|---|
1s | ~4.8s |
12s (default) | ~4.3s |
5m | ~3.8s |
Nginx Response Time (mean_nginx_response)¶
Interval | Duration |
---|---|
1s | ~16.9s |
12s (default) | ~15.6s |
5m | ~15s |
Actions