action #112196
closed[alert][sporadic] QA-Power8-4-kvm: Disk I/O time alert size:M
0%
Description
Observation¶
[Alerting] QA-Power8-4-kvm: Disk I/O time alert
Metric name sdj
Value 26600.000
Problem¶
We just recently in #110269 worked on Disk I/O time alerts, also on QA-Power8-4-kvm. Either we need to relax values even more, or there is a real hardware problem or we need to find different solutions, e.g. longer pending time.
Acceptance criteria¶
- AC1: No more alerts
Suggestions¶
- Check that there are no actual hardware issues e.g. using smartctl, do what https://progress.opensuse.org/issues/110269#note-12 says
- Bump the values again
Why do we monitor the disk sdj? the machine seems to have only two real physical devices, sda and sdb.
journalctl | grep sdj
reports:May 22 03:33:08 QA-Power8-4-kvm kernel: sd 7:0:0:3: [sdj] Attached SCSI removable disk May 29 03:33:08 QA-Power8-4-kvm kernel: sd 7:0:0:3: [sdj] Attached SCSI removable disk Jun 05 03:33:07 QA-Power8-4-kvm kernel: sd 8:0:0:3: [sdj] Attached SCSI removable disk
We should make sure we do not care about such devices or do not even have these. The devices always show up during boot.
- Take a look what was done in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/682 to ignore certain devices, also see https://github.com/influxdata/telegraf/tree/master/plugins/inputs/diskio for options
Updated by okurz over 2 years ago
- Related to action #110269: [alert] QA-Power8-4-kvm + QA-Power8-5-kvm: Disk I/O time alert size:M added
Updated by livdywan over 2 years ago
- Subject changed from [alert][sporadic] QA-Power8-4-kvm: Disk I/O time alert to [alert][sporadic] QA-Power8-4-kvm: Disk I/O time alert size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz over 2 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
Updated by okurz over 2 years ago
- Due date set to 2022-06-22
- Status changed from In Progress to Feedback
Updated by okurz over 2 years ago
- Due date deleted (
2022-06-22) - Status changed from Feedback to Resolved
merged. https://monitor.qa.suse.de/d/WDQA-Power8-4-kvm/worker-dashboard-qa-power8-4-kvm?orgId=1&from=1654712615168&to=1654712650312&viewPanel=56720 shows the removable devices vanished. https://monitor.qa.suse.de/alerting/list?state=not_ok shows no related alert.