action #135833
closedfalse-positive inode and disk usage alert on windows image
0%
Description
Observation¶
https://stats.openqa-monitor.qa.suse.de/alerting/grafana/d74e764d-6097-4d14-b77c-76c8d1da6ff0/view?orgId=1
shows an alert about inode usage on path /var/lib/openqa/pool/43/Win11_22H2_English_x64 . I assume we lookup way too many paths which should not be relevant for inodes check. Checking on worker30 I can see that a windows iso is mounted on a loop device, likely by someone doing that in os-autoinst-distri-opensuse.
okurz@worker30:~> lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
loop0 7:0 0 5.2G 0 loop /var/lib/openqa/pool/43/Win11_22H2_English_x64
nvme0n1 259:1 0 5.8T 0 disk
|-nvme0n1p1 259:2 0 512M 0 part /boot/efi
|-nvme0n1p2 259:3 0 5.8T 0 part /var
| /usr/local
| /tmp
| /srv
| /root
| /opt
| /home
| /boot/grub2/x86_64-efi
| /boot/grub2/i386-pc
| /.snapshots
| /
`-nvme0n1p3 259:4 0 1G 0 part [SWAP]
nvme2n1 259:5 0 476.9G 0 disk
`-md127 9:127 0 953.6G 0 raid0 /var/lib/openqa
nvme1n1 259:6 0 476.9G 0 disk
`-md127 9:127 0 953.6G 0 raid0 /var/lib/openqa
similar in https://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_worker30/view?returnTo=%2Fd%2FWDworker30%2Fworker-dashboard-worker30%3ForgId%3D1%26viewPanel%3D65090%26editPanel%3D65090%26tab%3Dalert for loop0 (udf)
disk usage alert
Acceptance criteria¶
- AC1: No alert about inode usage of temporary openQA assets
- AC2: No alert about disk usage of temporary openQA assets
Suggestions¶
- Check inodes alert definition
- Ensure that the inode usage is only collected for "reasonable" filesystems, maybe only exclude "loop" devices
- Same for disk usage
Rollback actions¶
- Remove silence(s) about inode utilization and disk usage
Updated by okurz about 1 year ago
- Description updated (diff)
- Priority changed from Normal to High
Updated by okurz about 1 year ago
- Subject changed from false-positive inode alert on windows image to false-positive inode and disk usage alert on windows image
- Description updated (diff)
- Status changed from New to In Progress
- Assignee set to okurz
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/989
Tested on worker30:
worker30:/etc/telegraf # telegraf --test --config telegraf.conf | grep disk | grep udf
2023-09-15T18:56:58Z I! Starting Telegraf unknown
2023-09-15T18:56:58Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-09-15T18:56:58Z I! Loaded inputs: chrony cpu disk diskio exec (2x) kernel mem net processes swap system
2023-09-15T18:56:58Z I! Loaded aggregators:
2023-09-15T18:56:58Z I! Loaded processors:
2023-09-15T18:56:58Z I! Loaded secretstores:
2023-09-15T18:56:58Z W! Outputs are not used in testing mode!
2023-09-15T18:56:58Z I! Tags enabled: host=worker30
2023-09-15T18:56:58Z W! [inputs.diskio] Error gathering disk info: no such file or directory
> disk,device=loop0,fstype=udf,host=worker30,mode=ro,path=/var/lib/openqa/pool/43/Win11_22H2_English_x64 free=0i,inodes_free=0i,inodes_total=1031i,inodes_used=1031i,total=5556809728i,used=5556809728i,used_percent=100 1694804218000000000
worker30:/etc/telegraf # vim telegraf.conf
worker30:/etc/telegraf # telegraf --test --config telegraf.conf | grep disk | grep udf
2023-09-15T18:57:11Z I! Starting Telegraf unknown
2023-09-15T18:57:11Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-09-15T18:57:11Z I! Loaded inputs: chrony cpu disk diskio exec (2x) kernel mem net processes swap system
2023-09-15T18:57:11Z I! Loaded aggregators:
2023-09-15T18:57:11Z I! Loaded processors:
2023-09-15T18:57:11Z I! Loaded secretstores:
2023-09-15T18:57:11Z W! Outputs are not used in testing mode!
2023-09-15T18:57:11Z I! Tags enabled: host=worker30
2023-09-15T18:57:11Z W! [inputs.diskio] Error gathering disk info: no such file or directory
EDIT: Apparently not enough: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/991
Updated by okurz about 1 year ago
- Status changed from In Progress to Resolved
all merged, silences removed