Project

General

Profile

action #102942

Updated by okurz over 2 years ago

## Observation 

 on 11:20 today we received an alert that a service is failing. Checking https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=1637742604202&to=1637749443783 I see that is is because snapper-cleanup on QA-Power8-4-kvm fails. Checking the journal reveals: 

 ``` 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: Started Daily Cleanup of Snapper Snapshots. 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: running number cleanup for 'root'. 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: Deleting snapshot failed. 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: number cleanup for 'root' failed. 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: running timeline cleanup for 'root'. 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: running empty-pre-post cleanup for 'root'. 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: snapper-cleanup.service: Main process exited, code=exited, status=1/FAILURE 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: snapper-cleanup.service: Unit entered failed state. 
 Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: snapper-cleanup.service: Failed with result 'exit-code'. 
 ``` 

 `/var/log/snapper.log` has some more details: 

 ``` 
 2021-11-24 10:53:14 MIL libsnapper(2912) Comparison.cc(Comparison):57 - num1:1693 num2:1694 
 2021-11-24 10:53:14 MIL libsnapper(2912) Comparison.cc(load):163 - num1:1693 num2:1694 
 2021-11-24 10:53:14 MIL libsnapper(2912) Comparison.cc(create):138 - num1:1693 num2:1694 
 2021-11-24 10:53:14 MIL libsnapper(2912) Btrfs.cc(cmpDirs):1442 - special btrfs cmpDirs 
 2021-11-24 10:53:14 MIL libsnapper(2912) Btrfs.cc(process):1413 - dir1:'//.snapshots/1693/snapshot' dir2:'//.snapshots/1694/snapshot' 
 2021-11-24 10:53:16 MIL libsnapper(2912) Btrfs.cc(cmpDirs):1454 - stopwatch 2.136912s for comparing directories 
 2021-11-24 10:53:16 MIL libsnapper(2912) Comparison.cc(create):156 - found 4054 lines 
 2021-11-24 10:53:16 MIL libsnapper(2912) Comparison.cc(save):221 - num1:1693 num2:1694 
 2021-11-24 10:53:47 MIL libsnapper(2912) Snapper.cc(~Snapper):136 - Snapper destructor 
 2021-11-24 10:54:16 MIL libsnapper(2912) snapperd.cc(main):288 - Exiting 
 2021-11-24 11:00:00 MIL libsnapper(11138) snapperd.cc(main):276 - Requesting DBus name 
 2021-11-24 11:00:00 MIL libsnapper(11138) snapperd.cc(main):280 - Loading snapper configs 
 2021-11-24 11:00:00 MIL libsnapper(11138) Snapper.cc(getConfigs):272 - Snapper get-configs 
 2021-11-24 11:00:00 MIL libsnapper(11138) Snapper.cc(getConfigs):273 - libsnapper version 0.8.9 
 2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(reload):114 - loading file /etc/sysconfig/snapper 
 2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:SNAPPER_CONFIGS value:root 
 2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root 
 2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/ 
 2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:ALLOW_USERS value: 
 2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:ALLOW_GROUPS value: 
 2021-11-24 11:00:00 MIL libsnapper(11138) snapperd.cc(main):284 - Listening for method calls and signals 
 2021-11-24 11:01:00 MIL libsnapper(11138) snapperd.cc(main):288 - Exiting 
 2021-11-24 11:03:00 MIL libsnapper(11221) snapperd.cc(main):276 - Requesting DBus name 
 2021-11-24 11:03:00 MIL libsnapper(11221) snapperd.cc(main):280 - Loading snapper configs 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(getConfigs):272 - Snapper get-configs 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(getConfigs):273 - libsnapper version 0.8.9 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(reload):114 - loading file /etc/sysconfig/snapper 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SNAPPER_CONFIGS value:root 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/ 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:ALLOW_USERS value: 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:ALLOW_GROUPS value: 
 2021-11-24 11:03:00 MIL libsnapper(11221) snapperd.cc(main):284 - Listening for method calls and signals 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):91 - Snapper constructor 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):92 - libsnapper version 0.8.9 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):93 - config_name:root disable_filters:false 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/ 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:FSTYPE value:btrfs 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:QGROUP value:1/0 
 2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SYNC_ACL value:no 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):125 - subvolume:/ filesystem:btrfs 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(loadIgnorePatterns):177 - number of ignore patterns:8 
 2021-11-24 11:03:00 MIL libsnapper(11221) Snapshot.cc(read):272 - found 22 snapshots 
 2021-11-24 11:03:00 ERR libsnapper(11221) Btrfs.cc(deleteSnapshot):441 - delete snapshot failed, ioctl(BTRFS_IOC_SNAP_DESTROY) failed, errno:39 (Directory not empty) 
 2021-11-24 11:03:00 WAR libsnapper(11221) Btrfs.cc(deleteSnapshot):442 - THROW: delete snapshot failed 
 2021-11-24 11:03:00 WAR libsnapper(11221) Client.cc(dispatch):1800 - CAUGHT: delete snapshot failed 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1651 num2:1652 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1651 num2:1652 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 1886 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1669 num2:1670 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1669 num2:1670 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4465 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1677 num2:1678 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1677 num2:1678 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4054 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1681 num2:1682 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1681 num2:1682 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 6172 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1683 num2:1684 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1683 num2:1684 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 47 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1685 num2:1686 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1685 num2:1686 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 18 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1687 num2:1688 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1687 num2:1688 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4054 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1689 num2:1690 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1689 num2:1690 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 15 lines 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1693 num2:1694 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1693 num2:1694 
 2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4054 lines 
 2021-11-24 11:03:30 MIL libsnapper(11221) Snapper.cc(~Snapper):136 - Snapper destructor 
 2021-11-24 11:04:00 MIL libsnapper(11221) snapperd.cc(main):288 - Exiting 
 2021-11-24 11:10:23 MIL libsnapper(11521) snapperd.cc(main):276 - Requesting DBus name 
 2021-11-24 11:10:23 MIL libsnapper(11521) snapperd.cc(main):280 - Loading snapper configs 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(getConfigs):272 - Snapper get-configs 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(getConfigs):273 - libsnapper version 0.8.9 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(reload):114 - loading file /etc/sysconfig/snapper 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SNAPPER_CONFIGS value:root 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/ 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:ALLOW_USERS value: 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:ALLOW_GROUPS value: 
 2021-11-24 11:10:23 MIL libsnapper(11521) snapperd.cc(main):284 - Listening for method calls and signals 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):91 - Snapper constructor 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):92 - libsnapper version 0.8.9 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):93 - config_name:root disable_filters:false 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/ 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:FSTYPE value:btrfs 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:QGROUP value:1/0 
 2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SYNC_ACL value:no 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):125 - subvolume:/ filesystem:btrfs 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(loadIgnorePatterns):177 - number of ignore patterns:8 
 2021-11-24 11:10:23 MIL libsnapper(11521) Snapshot.cc(read):272 - found 22 snapshots 
 2021-11-24 11:10:28 WAR libsnapper(11521) Snapper.cc(calculateUsedSpace):766 - THROW: quota rescan or sync failed 
 2021-11-24 11:10:28 WAR libsnapper(11521) Client.cc(dispatch):1860 - CAUGHT: quota rescan or sync failed 
 2021-11-24 11:10:58 MIL libsnapper(11521) Snapper.cc(~Snapper):136 - Snapper destructor 
 2021-11-24 11:11:28 MIL libsnapper(11521) snapperd.cc(main):288 - Exiting 
 2021-11-24 11:25:29 MIL libsnapper(13721) snapperd.cc(main):276 - Requesting DBus name 
 2021-11-24 11:25:29 MIL libsnapper(13721) snapperd.cc(main):280 - Loading sna 
 ``` 

 I think the relevant lines are these: 

 ``` 
 2021-11-24 11:03:00 WAR libsnapper(11221) Btrfs.cc(deleteSnapshot):442 - THROW: delete snapshot failed 
 2021-11-24 11:03:00 WAR libsnapper(11221) Client.cc(dispatch):1800 - CAUGHT: delete snapshot failed 
 2021-11-24 11:10:28 WAR libsnapper(11521) Snapper.cc(calculateUsedSpace):766 - THROW: quota rescan or sync failed 
 2021-11-24 11:10:28 WAR libsnapper(11521) Client.cc(dispatch):1860 - CAUGHT: quota rescan or sync failed 
 2021-11-24 11:25:29 WAR libsnapper(13721) Snapper.cc(calculateUsedSpace):766 - THROW: quota rescan or sync failed 
 2021-11-24 11:25:29 WAR libsnapper(13721) Client.cc(dispatch):1860 - CAUGHT: quota rescan or sync failed 
 ``` 

 Checking `dmesg` for btrfs errors shows: 

 ``` 
 [Nov24 11:10] BTRFS warning (device sdb2): qgroup rescan init failed, qgroup is not enabled 
 [Nov24 11:25] BTRFS warning (device sdb2): qgroup rescan init failed, qgroup is not enabled 
 ``` 

 But IIRC qgroup's should be default. I wonder how and why this change happened. I assume that turning on quota support again could fix the issue at hand but it should be investigated why it was disabled. 

 ## Acceptance criteria 
 * **AC1:** No failed service snapper-cleanup on qa-power8-4-kvm anymore 

 ## Rollback steps 
 * Unpause https://monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services

Back