Project

General

Profile

Actions

action #89821

closed

alert: PROBLEM Service Alert: openqa.suse.de/fs_/srv is WARNING (flaky, partial recovery with OK messages)

Added by okurz about 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2021-03-10
Due date:
% Done:

0%

Estimated time:

Description

Observation

Multiple alert email reports:
Notification: PROBLEM
Host: openqa.suse.de
State: WARNING
Date/Time: Tue Mar 9 13:17:18 UTC 2021
Info: WARN - 80.1% used (64.06 of 79.99 GB), trend: +573.77 MB / 24 hours

Service: fs_/srv

See Online: https://thruk.suse.de/thruk/cgi-bin/extinfo.cgi?type=2&host=openqa.suse.de&service=fs_%2Fsrv

Acceptance criteria

  • AC1: /srv on osd has enough free space
  • AC2: alert is handled
  • AC3: icinga alert is only triggering if internal grafana alert is not handled or not effective

Suggestions

  • Follow the above thruk link to understand the monitoring data
  • Crosscheck alert limit "80%" with the limit we have in grafana
  • Make sure the grafana limit is smaller
  • Ensure there is enough space, e.g. ask EngInfra to increase or cleanup

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #100859: investigate how to optimize /srv data utilization on OSD size:SResolvedmkittler2021-10-12

Actions
Actions

Also available in: Atom PDF