Actions
action #17548
closedosd out of space
Start date:
2017-03-06
Due date:
% Done:
0%
Estimated time:
Difficulty:
Description
observation¶
2017-03-06 at around 1134 CET osd was not responsive on https://openqa.suse.de. So far no one reported to have received a monitoring notification. Checking https://nagios.nue.suse.com/pnp4nagios/graph?host=openqa.suse.de&start=1488795630&end=1488797366 apparently at 1134 CET there was a sudden surge in disk write causing the disk to fill up. This was causing many daemons to crash.
open questions¶
- why was there no monitoring notification
- can we move the yellow bar further down
- on nagios clicking on the icons above graphs in the top right corner like "most recent alerts…" yields 404
actions done¶
- szarate, coolo, okurz investigated about the immediate cause problem, found the disk space depletion notifications from daemons in logfiles although apparently there is disk space available (after a surge, it seems). Restarted daemons
- moved /home/geekotest/SQL-DUMPS (2.7G) to /var/lib/openqa/backup/ and replaced by symlink, adjusted /etc/cron.daily/dump-openqa accordingly
Rest will be tracked in #12912
Updated by okurz about 8 years ago
- Related to action #12912: [tools]monitoring of o3/osd added
Actions