Actions
action #167719
closedcoordination #161414: [epic] Improved salt based infrastructure management
No new data in monitor.qe.nue2.suse.org due to influxdb failing to write with ""error opening new segment file for wal (1): write /var/lib/influxdb/….wal: no space left on device"
Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-10-02
Due date:
% Done:
0%
Estimated time:
Description
Observation¶
No new data in monitor.qe.nue2.suse.org due to influxdb failing to write with ""error opening new segment file for wal (1): write /var/lib/influxdb/….wal: no space left on device"
df -h
says
/dev/vdb1 300G 291G 8.0G 98% /var/lib/influxdb
Updated by okurz about 2 months ago
- Assignee deleted (
okurz) - Priority changed from Urgent to Normal
I logged into the monitor instance, called systemctl status, found all good and then checked the service status on first grafana which was fine and second influxdb which showed error messages. On qamaster.qe.nue2.suse.org I shut down monitor and then did
qemu-img resize /var/lib/libvirt/images/openqa-monitoring-data.qcow2 +200G
I booted monitor.qe.nue2.suse.org I did
parted -s -a opt /dev/vdb "resizepart 1 100%"
btrfs fi resize max /var/lib/influxdb
Following the system journal I could see that all recovered well.
Next tasks:
- Check ressource usage within influxdb which measurements consume the most
- Find out why we didn't see the space usage problem in before and receive alerts
Updated by okurz about 2 months ago
- Copied to action #167722: Efficient use of monitoring data within influxdb on monitor.qe.nue2.suse.org size:M added
Actions