Project

General

Profile

Actions

action #135833

closed

false-positive inode and disk usage alert on windows image

Added by okurz 8 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2023-09-15
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://stats.openqa-monitor.qa.suse.de/alerting/grafana/d74e764d-6097-4d14-b77c-76c8d1da6ff0/view?orgId=1
shows an alert about inode usage on path /var/lib/openqa/pool/43/Win11_22H2_English_x64 . I assume we lookup way too many paths which should not be relevant for inodes check. Checking on worker30 I can see that a windows iso is mounted on a loop device, likely by someone doing that in os-autoinst-distri-opensuse.

okurz@worker30:~> lsblk 
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
loop0         7:0    0   5.2G  0 loop  /var/lib/openqa/pool/43/Win11_22H2_English_x64
nvme0n1     259:1    0   5.8T  0 disk  
|-nvme0n1p1 259:2    0   512M  0 part  /boot/efi
|-nvme0n1p2 259:3    0   5.8T  0 part  /var
|                                      /usr/local
|                                      /tmp
|                                      /srv
|                                      /root
|                                      /opt
|                                      /home
|                                      /boot/grub2/x86_64-efi
|                                      /boot/grub2/i386-pc
|                                      /.snapshots
|                                      /
`-nvme0n1p3 259:4    0     1G  0 part  [SWAP]
nvme2n1     259:5    0 476.9G  0 disk  
`-md127       9:127  0 953.6G  0 raid0 /var/lib/openqa
nvme1n1     259:6    0 476.9G  0 disk  
`-md127       9:127  0 953.6G  0 raid0 /var/lib/openqa

similar in https://stats.openqa-monitor.qa.suse.de/alerting/grafana/partitions_usage_alert_worker30/view?returnTo=%2Fd%2FWDworker30%2Fworker-dashboard-worker30%3ForgId%3D1%26viewPanel%3D65090%26editPanel%3D65090%26tab%3Dalert for loop0 (udf) disk usage alert

Acceptance criteria

  • AC1: No alert about inode usage of temporary openQA assets
  • AC2: No alert about disk usage of temporary openQA assets

Suggestions

  • Check inodes alert definition
  • Ensure that the inode usage is only collected for "reasonable" filesystems, maybe only exclude "loop" devices
  • Same for disk usage

Rollback actions

  • Remove silence(s) about inode utilization and disk usage
Actions #1

Updated by okurz 8 months ago

  • Description updated (diff)
  • Priority changed from Normal to High
Actions #2

Updated by okurz 8 months ago

  • Subject changed from false-positive inode alert on windows image to false-positive inode and disk usage alert on windows image
  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to okurz

https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/989

Tested on worker30:

worker30:/etc/telegraf # telegraf --test --config telegraf.conf  | grep disk | grep udf
2023-09-15T18:56:58Z I! Starting Telegraf unknown
2023-09-15T18:56:58Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-09-15T18:56:58Z I! Loaded inputs: chrony cpu disk diskio exec (2x) kernel mem net processes swap system
2023-09-15T18:56:58Z I! Loaded aggregators: 
2023-09-15T18:56:58Z I! Loaded processors: 
2023-09-15T18:56:58Z I! Loaded secretstores: 
2023-09-15T18:56:58Z W! Outputs are not used in testing mode!
2023-09-15T18:56:58Z I! Tags enabled: host=worker30
2023-09-15T18:56:58Z W! [inputs.diskio] Error gathering disk info: no such file or directory
> disk,device=loop0,fstype=udf,host=worker30,mode=ro,path=/var/lib/openqa/pool/43/Win11_22H2_English_x64 free=0i,inodes_free=0i,inodes_total=1031i,inodes_used=1031i,total=5556809728i,used=5556809728i,used_percent=100 1694804218000000000
worker30:/etc/telegraf # vim telegraf.conf 
worker30:/etc/telegraf # telegraf --test --config telegraf.conf  | grep disk | grep udf
2023-09-15T18:57:11Z I! Starting Telegraf unknown
2023-09-15T18:57:11Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-09-15T18:57:11Z I! Loaded inputs: chrony cpu disk diskio exec (2x) kernel mem net processes swap system
2023-09-15T18:57:11Z I! Loaded aggregators: 
2023-09-15T18:57:11Z I! Loaded processors: 
2023-09-15T18:57:11Z I! Loaded secretstores: 
2023-09-15T18:57:11Z W! Outputs are not used in testing mode!
2023-09-15T18:57:11Z I! Tags enabled: host=worker30
2023-09-15T18:57:11Z W! [inputs.diskio] Error gathering disk info: no such file or directory

EDIT: Apparently not enough: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/991

Actions #3

Updated by okurz 8 months ago

  • Status changed from In Progress to Resolved

all merged, silences removed

Actions

Also available in: Atom PDF