Project

General

Profile

Actions

action #167722

open

coordination #161414: [epic] Improved salt based infrastructure management

Efficient use of monitoring data within influxdb on monitor.qe.nue2.suse.org size:M

Added by okurz 7 months ago. Updated 4 months ago.

Status:
Workable
Priority:
Normal
Assignee:
-
Category:
Feature requests
Target version:
Start date:
2024-10-02
Due date:
% Done:

0%

Estimated time:

Description

Observation

In #167719 we ran out of space because influxdb grew to 300G+ on monitor.qe.nue2.suse.org. We should look into which measurements consume the most space and ensure that we save space efficiently

Acceptance criteria

  • AC1: influxdb on monitor.qa.suse.de uses significantly less than 300G
  • AC2: We know the biggest space usage contributors in influxdb
  • AC3: We still have a reasonable history of important data, e.g. executed openQA jobs on OSD going back multiple months if not years

Suggestions

Rollback steps

Out of scope

  • Increase disk space - this was already done in #167719

Related issues 4 (1 open3 closed)

Related to openQA Infrastructure (public) - action #103380: Configure retention/downsampling policy for specific monitoring data stored within InfluxDBBlockedokurz2021-12-01

Actions
Has duplicate openQA Infrastructure (public) - action #169750: [alert] backup-vm (backup-vm: partitions usage (%) alert Generic partitions_usage_alert_backup-vm generic)Resolvednicksinger2024-11-122024-11-27

Actions
Copied from openQA Infrastructure (public) - action #167719: No new data in monitor.qe.nue2.suse.org due to influxdb failing to write with ""error opening new segment file for wal (1): write /var/lib/influxdb/….wal: no space left on device"Resolvedokurz2024-10-02

Actions
Copied to openQA Infrastructure (public) - action #167728: grafana dashboard for monitor.qe.nue2.suse.org size:SResolvedgpathak2024-10-02

Actions
Actions

Also available in: Atom PDF