action #167719
closed
coordination #161414: [epic] Improved salt based infrastructure management
No new data in monitor.qe.nue2.suse.org due to influxdb failing to write with ""error opening new segment file for wal (1): write /var/lib/influxdb/….wal: no space left on device"
Added by okurz about 2 months ago.
Updated about 2 months ago.
Category:
Regressions/Crashes
Description
Observation¶
No new data in monitor.qe.nue2.suse.org due to influxdb failing to write with ""error opening new segment file for wal (1): write /var/lib/influxdb/….wal: no space left on device"
df -h
says
/dev/vdb1 300G 291G 8.0G 98% /var/lib/influxdb
Related issues
1 (1 open — 0 closed)
- Assignee deleted (
okurz)
- Priority changed from Urgent to Normal
I logged into the monitor instance, called systemctl status, found all good and then checked the service status on first grafana which was fine and second influxdb which showed error messages. On qamaster.qe.nue2.suse.org I shut down monitor and then did
qemu-img resize /var/lib/libvirt/images/openqa-monitoring-data.qcow2 +200G
I booted monitor.qe.nue2.suse.org I did
parted -s -a opt /dev/vdb "resizepart 1 100%"
btrfs fi resize max /var/lib/influxdb
Following the system journal I could see that all recovered well.
Next tasks:
- Check ressource usage within influxdb which measurements consume the most
- Find out why we didn't see the space usage problem in before and receive alerts
- Copied to action #167722: Efficient use of monitoring data within influxdb on monitor.qe.nue2.suse.org size:M added
- Parent task set to #161414
- Status changed from New to Resolved
- Assignee set to okurz
- Priority changed from Normal to Urgent
I created dedicated tickets for the two identified follow-up tasks:
- #167722
- #167728
Also available in: Atom
PDF