Project

General

Profile

Actions

action #164979

closed

[alert][grafana] File systems alert for WebUI /results size:S

Added by nicksinger 5 months ago. Updated 4 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Start date:
Due date:
2024-08-21
% Done:

0%

Estimated time:
Tags:

Description

Observation

Observed at 2024-08-06 07:17:00 +0200 CEST
One of the file systems /results is too full (> 90%)
See http://stats.openqa-monitor.qa.suse.de/d/WebuiDb?orgId=1&viewPanel=74

Current usage:

Filesystem      Size  Used Avail Use% Mounted on
/dev/vdd        7.0T  6.4T  681G  91% /results

Acceptance criteria

AC1: There is enough space and headroom on the affected file system /results, i.e. considerably more than 20%

Suggestions

  • Check job group "logs" retention settings for "not-important" / "groupless" result and consider reducing the period
  • Consider extending the silence period if fixing takes too long: https://stats.openqa-monitor.qa.suse.de/alerting/silence/9ee9b299-3d06-4234-97bf-6b84e2ad9a24/edit?alertmanager=grafana
  • Reconsider the design of scheduling openqa-investigate for unreviewed jobs and possibly plan in a separate ticket
  • Tell the security squad that their test scenario(s) are problematic and should fail less or be properly reviewed
  • Tell the security squad about their test scenario(s) which is significantly bigger than other jobs and consider reducing the space usage, e.g. save less or compress stuff

Rollback steps

Out of scope

  • Better accounting e.g. linking of investigation jobs to their original groups -> #164988

Files


Related issues 2 (1 open1 closed)

Copied from openQA Infrastructure (public) - action #129244: [alert][grafana] File systems alert for WebUI /results size:MResolvedmkittler2023-05-122023-05-30

Actions
Copied to openQA Project (public) - action #164988: Better accounting for openqa-investigation jobs size:SWorkable2024-08-06

Actions
Actions

Also available in: Atom PDF