Project

General

Profile

Actions

action #168718

open

openQA Project - coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6

The "response codes" panel takes a considerable time to load or even runs into timeouts size:M

Added by okurz about 1 month ago. Updated 26 days ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
2024-10-17
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

The "response codes" panel takes a considerable time to load or even runs into timeouts:
https://stats.openqa-monitor.qa.suse.de/d/WebuiDb/webui-summary?viewPanel=80&orgId=1

Acceptance criteria

Suggestions

  • Look into what makes the query slow, could be (too) big database measurements in influxdb needing tailoring or telegraf already pushing too much data
    • Adjust the interval used to push new data
  • The query also sometimes times out completely, resulting in no data
    • We checked if this could be something like multiple requests on different machines but couldn't confirm that
    • Grafana might still be running operations that already timed out?
    • Even a small range like 24h is likely to hit the issue
  • Slowness also affects other panels
  • Monitor resource usage on the VM and hypervisor host
    • Make sure we have enough resources for Grafana/InfluxDB
    • Trim data we have in InfluxDB somehow to make the amount of data more manageable
  • Look into limiting the retention of data e.g. up to 1 year only
Actions

Also available in: Atom PDF