Actions
action #174679
open[alert][FIRING:1] baremetal-support (baremetal-support: Disk I/O time alert Generic disk_io_time_alert_baremetal-support generic) size:S
Status:
In Progress
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2024-12-23
Due date:
2025-04-10 (Due in 6 days)
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
https://monitor.qa.suse.de/d/GDbaremetal-support/dashboard-for-baremetal-support?orgId=1&from=2024-12-23T04:59:10.961Z&to=2024-12-23T05:04:37.311Z&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-56720
and other instances show a significantly slow response on I/O requests in the range of 10s, see
Acceptance criteria¶
- AC1: It is known why I/O increased
- AC2: I/O does not continue to increase steadily
- AC3: There is no alert anymore about disk I/O on baremetal-support
Rollback actions¶
- Remove silence from https://monitor.qa.suse.de/alerting/silences?alertmanager=grafana
alertname=baremetal-support: Disk I/O time alert
Suggestions¶
- Look into the concerning increase of disk i/o time in https://monitor.qa.suse.de/d/GDbaremetal-support/dashboard-for-baremetal-support?orgId=1&from=2024-11-14T12:20:35.343Z&to=2025-01-14T13:25:40.428Z&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-56720
- Check drive metrics for other VMs on qamaster
- Check the disk(s) for problems (on the hypervisor host) and potentially fix
- Consider if moving to a new machine makes sense
Files
Actions