Project

General

Profile

Actions

action #17548

closed

osd out of space

Added by okurz about 7 years ago.

Status:
Resolved
Priority:
Immediate
Assignee:
Category:
Infrastructure
Target version:
-
Start date:
2017-03-06
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

observation

2017-03-06 at around 1134 CET osd was not responsive on https://openqa.suse.de. So far no one reported to have received a monitoring notification. Checking https://nagios.nue.suse.com/pnp4nagios/graph?host=openqa.suse.de&start=1488795630&end=1488797366 apparently at 1134 CET there was a sudden surge in disk write causing the disk to fill up. This was causing many daemons to crash.

open questions

  • why was there no monitoring notification
  • can we move the yellow bar further down
  • on nagios clicking on the icons above graphs in the top right corner like "most recent alerts…" yields 404

actions done

  • szarate, coolo, okurz investigated about the immediate cause problem, found the disk space depletion notifications from daemons in logfiles although apparently there is disk space available (after a surge, it seems). Restarted daemons
  • moved /home/geekotest/SQL-DUMPS (2.7G) to /var/lib/openqa/backup/ and replaced by symlink, adjusted /etc/cron.daily/dump-openqa accordingly

Rest will be tracked in #12912


Related issues 1 (0 open1 closed)

Related to openQA Project - action #12912: [tools]monitoring of o3/osdResolvedokurz2016-07-28

Actions
Actions #1

Updated by okurz about 7 years ago

Actions

Also available in: Atom PDF