action #174679
Updated by robert.richardson 5 months ago
[alert][FIRING:1] baremetal-support (baremetal-support: Disk I/O time alert Generic disk_io_time_alert_baremetal-support generic) size: S ## Observation https://monitor.qa.suse.de/d/GDbaremetal-support/dashboard-for-baremetal-support?orgId=1&from=2024-12-23T04:59:10.961Z&to=2024-12-23T05:04:37.311Z&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-56720 and other instances show a significantly slow response on I/O requests in the range of 10s, see  ## Acceptance criteria * **AC1**: It is known why I/O increased * **AC2**: I/O does not continue to increase steadily * **AC3:** There is no alert anymore about disk I/O on baremetal-support ## Rollback actions * Remove silence from https://monitor.qa.suse.de/alerting/silences?alertmanager=grafana `alertname=baremetal-support: Disk I/O time alert` ## Suggestions * Look into the concerning increase of disk i/o time in https://monitor.qa.suse.de/d/GDbaremetal-support/dashboard-for-baremetal-support?orgId=1&from=2024-11-14T12:20:35.343Z&to=2025-01-14T13:25:40.428Z&timezone=browser&var-datasource=000000001&refresh=1m&viewPanel=panel-56720 * Check drive metrics for other VMs on qamaster * Check the disk(s) for problems (on the hypervisor host) and potentially fix * Consider if moving to a new machine makes sense