Project

General

Profile

Actions

action #102942

closed

Failed systemd services alert: snapper-cleanup on QA-Power8-4-kvm fails size:M

Added by nicksinger over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2021-11-24
Due date:
% Done:

0%

Estimated time:

Description

Observation

on 11:20 today we received an alert that a service is failing. Checking https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services?orgId=1&from=1637742604202&to=1637749443783 I see that is is because snapper-cleanup on QA-Power8-4-kvm fails. Checking the journal reveals:

Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: running number cleanup for 'root'.
Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: Deleting snapshot failed.
Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: number cleanup for 'root' failed.
Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: running timeline cleanup for 'root'.
Nov 24 11:03:00 QA-Power8-4-kvm systemd-helper[11210]: running empty-pre-post cleanup for 'root'.
Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: snapper-cleanup.service: Main process exited, code=exited, status=1/FAILURE
Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: snapper-cleanup.service: Unit entered failed state.
Nov 24 11:03:00 QA-Power8-4-kvm systemd[1]: snapper-cleanup.service: Failed with result 'exit-code'.

/var/log/snapper.log has some more details:

2021-11-24 10:53:14 MIL libsnapper(2912) Comparison.cc(Comparison):57 - num1:1693 num2:1694
2021-11-24 10:53:14 MIL libsnapper(2912) Comparison.cc(load):163 - num1:1693 num2:1694
2021-11-24 10:53:14 MIL libsnapper(2912) Comparison.cc(create):138 - num1:1693 num2:1694
2021-11-24 10:53:14 MIL libsnapper(2912) Btrfs.cc(cmpDirs):1442 - special btrfs cmpDirs
2021-11-24 10:53:14 MIL libsnapper(2912) Btrfs.cc(process):1413 - dir1:'//.snapshots/1693/snapshot' dir2:'//.snapshots/1694/snapshot'
2021-11-24 10:53:16 MIL libsnapper(2912) Btrfs.cc(cmpDirs):1454 - stopwatch 2.136912s for comparing directories
2021-11-24 10:53:16 MIL libsnapper(2912) Comparison.cc(create):156 - found 4054 lines
2021-11-24 10:53:16 MIL libsnapper(2912) Comparison.cc(save):221 - num1:1693 num2:1694
2021-11-24 10:53:47 MIL libsnapper(2912) Snapper.cc(~Snapper):136 - Snapper destructor
2021-11-24 10:54:16 MIL libsnapper(2912) snapperd.cc(main):288 - Exiting
2021-11-24 11:00:00 MIL libsnapper(11138) snapperd.cc(main):276 - Requesting DBus name
2021-11-24 11:00:00 MIL libsnapper(11138) snapperd.cc(main):280 - Loading snapper configs
2021-11-24 11:00:00 MIL libsnapper(11138) Snapper.cc(getConfigs):272 - Snapper get-configs
2021-11-24 11:00:00 MIL libsnapper(11138) Snapper.cc(getConfigs):273 - libsnapper version 0.8.9
2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(reload):114 - loading file /etc/sysconfig/snapper
2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:SNAPPER_CONFIGS value:root
2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root
2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/
2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:ALLOW_USERS value:
2021-11-24 11:00:00 MIL libsnapper(11138) AsciiFile.cc(getValue):243 - key:ALLOW_GROUPS value:
2021-11-24 11:00:00 MIL libsnapper(11138) snapperd.cc(main):284 - Listening for method calls and signals
2021-11-24 11:01:00 MIL libsnapper(11138) snapperd.cc(main):288 - Exiting
2021-11-24 11:03:00 MIL libsnapper(11221) snapperd.cc(main):276 - Requesting DBus name
2021-11-24 11:03:00 MIL libsnapper(11221) snapperd.cc(main):280 - Loading snapper configs
2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(getConfigs):272 - Snapper get-configs
2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(getConfigs):273 - libsnapper version 0.8.9
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(reload):114 - loading file /etc/sysconfig/snapper
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SNAPPER_CONFIGS value:root
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:ALLOW_USERS value:
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:ALLOW_GROUPS value:
2021-11-24 11:03:00 MIL libsnapper(11221) snapperd.cc(main):284 - Listening for method calls and signals
2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):91 - Snapper constructor
2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):92 - libsnapper version 0.8.9
2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):93 - config_name:root disable_filters:false
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:FSTYPE value:btrfs
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:QGROUP value:1/0
2021-11-24 11:03:00 MIL libsnapper(11221) AsciiFile.cc(getValue):243 - key:SYNC_ACL value:no
2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(Snapper):125 - subvolume:/ filesystem:btrfs
2021-11-24 11:03:00 MIL libsnapper(11221) Snapper.cc(loadIgnorePatterns):177 - number of ignore patterns:8
2021-11-24 11:03:00 MIL libsnapper(11221) Snapshot.cc(read):272 - found 22 snapshots
2021-11-24 11:03:00 ERR libsnapper(11221) Btrfs.cc(deleteSnapshot):441 - delete snapshot failed, ioctl(BTRFS_IOC_SNAP_DESTROY) failed, errno:39 (Directory not empty)
2021-11-24 11:03:00 WAR libsnapper(11221) Btrfs.cc(deleteSnapshot):442 - THROW: delete snapshot failed
2021-11-24 11:03:00 WAR libsnapper(11221) Client.cc(dispatch):1800 - CAUGHT: delete snapshot failed
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1651 num2:1652
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1651 num2:1652
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 1886 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1669 num2:1670
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1669 num2:1670
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4465 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1677 num2:1678
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1677 num2:1678
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4054 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1681 num2:1682
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1681 num2:1682
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 6172 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1683 num2:1684
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1683 num2:1684
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 47 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1685 num2:1686
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1685 num2:1686
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 18 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1687 num2:1688
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1687 num2:1688
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4054 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1689 num2:1690
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1689 num2:1690
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 15 lines
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(Comparison):57 - num1:1693 num2:1694
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):163 - num1:1693 num2:1694
2021-11-24 11:03:00 MIL libsnapper(11221) Comparison.cc(load):212 - read 4054 lines
2021-11-24 11:03:30 MIL libsnapper(11221) Snapper.cc(~Snapper):136 - Snapper destructor
2021-11-24 11:04:00 MIL libsnapper(11221) snapperd.cc(main):288 - Exiting
2021-11-24 11:10:23 MIL libsnapper(11521) snapperd.cc(main):276 - Requesting DBus name
2021-11-24 11:10:23 MIL libsnapper(11521) snapperd.cc(main):280 - Loading snapper configs
2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(getConfigs):272 - Snapper get-configs
2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(getConfigs):273 - libsnapper version 0.8.9
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(reload):114 - loading file /etc/sysconfig/snapper
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SNAPPER_CONFIGS value:root
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:ALLOW_USERS value:
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:ALLOW_GROUPS value:
2021-11-24 11:10:23 MIL libsnapper(11521) snapperd.cc(main):284 - Listening for method calls and signals
2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):91 - Snapper constructor
2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):92 - libsnapper version 0.8.9
2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):93 - config_name:root disable_filters:false
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:FSTYPE value:btrfs
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:QGROUP value:1/0
2021-11-24 11:10:23 MIL libsnapper(11521) AsciiFile.cc(getValue):243 - key:SYNC_ACL value:no
2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(Snapper):125 - subvolume:/ filesystem:btrfs
2021-11-24 11:10:23 MIL libsnapper(11521) Snapper.cc(loadIgnorePatterns):177 - number of ignore patterns:8
2021-11-24 11:10:23 MIL libsnapper(11521) Snapshot.cc(read):272 - found 22 snapshots
2021-11-24 11:10:28 WAR libsnapper(11521) Snapper.cc(calculateUsedSpace):766 - THROW: quota rescan or sync failed
2021-11-24 11:10:28 WAR libsnapper(11521) Client.cc(dispatch):1860 - CAUGHT: quota rescan or sync failed
2021-11-24 11:10:58 MIL libsnapper(11521) Snapper.cc(~Snapper):136 - Snapper destructor
2021-11-24 11:11:28 MIL libsnapper(11521) snapperd.cc(main):288 - Exiting
2021-11-24 11:25:29 MIL libsnapper(13721) snapperd.cc(main):276 - Requesting DBus name
2021-11-24 11:25:29 MIL libsnapper(13721) snapperd.cc(main):280 - Loading sna

I think the relevant lines are these:

2021-11-24 11:03:00 WAR libsnapper(11221) Btrfs.cc(deleteSnapshot):442 - THROW: delete snapshot failed
2021-11-24 11:03:00 WAR libsnapper(11221) Client.cc(dispatch):1800 - CAUGHT: delete snapshot failed
2021-11-24 11:10:28 WAR libsnapper(11521) Snapper.cc(calculateUsedSpace):766 - THROW: quota rescan or sync failed
2021-11-24 11:10:28 WAR libsnapper(11521) Client.cc(dispatch):1860 - CAUGHT: quota rescan or sync failed
2021-11-24 11:25:29 WAR libsnapper(13721) Snapper.cc(calculateUsedSpace):766 - THROW: quota rescan or sync failed
2021-11-24 11:25:29 WAR libsnapper(13721) Client.cc(dispatch):1860 - CAUGHT: quota rescan or sync failed

Checking dmesg for btrfs errors shows:

[Nov24 11:10] BTRFS warning (device sdb2): qgroup rescan init failed, qgroup is not enabled
[Nov24 11:25] BTRFS warning (device sdb2): qgroup rescan init failed, qgroup is not enabled

But IIRC qgroup's should be default. I wonder how and why this change happened. I assume that turning on quota support again could fix the issue at hand but it should be investigated why it was disabled.

Acceptance criteria

  • AC1: No failed service snapper-cleanup on qa-power8-4-kvm anymore

Rollback steps


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:MResolvedokurz2023-06-22

Actions
Actions #1

Updated by okurz over 2 years ago

  • Target version set to Ready

maybe it actually never was properly enabled on that machine which we migrated from some older version. Or do you think it was just recently disabled?

Actions #2

Updated by mkittler over 2 years ago

I've been rolling the machine back using snapper rollback … and this failure happened after the reboot. However, I also don't know why exactly the error happened.

Actions #3

Updated by livdywan over 2 years ago

  • Subject changed from Failed systemd services alert: snapper-cleanup on QA-Power8-4-kvm fails - quota disabled (why?) to Failed systemd services alert: snapper-cleanup on QA-Power8-4-kvm fails size:M
  • Description updated (diff)
Actions #4

Updated by livdywan over 2 years ago

  • Status changed from New to Workable
Actions #5

Updated by nicksinger over 2 years ago

  • Description updated (diff)
  • Target version deleted (Ready)

okurz wrote:

maybe it actually never was properly disabled on that machine which we migrated from some older version. Or do you think it was just recently disabled?

I'd assume that the error was introduced with @mkittler's rollback and not earlier. Or snapper changed its behavior just recently to require quotas to be enabled. Maybe it is not related after all but it also was the only relevant error I found in logs.
@okurz you mentioned "never was properly disabled": Is this something we did on our workers?

Actions #6

Updated by okurz over 2 years ago

  • Target version set to Ready

nicksinger wrote:

@okurz you mentioned "never was properly disabled": Is this something we did on our workers?

I meant "never properly enabled", edited my comment. I meant, maybe btrfs qgroups have not been enabled when we installed the machines originally, maybe SLE12-SP1 or Leap 42.1 or something and then we upgraded and upgraded and simply never enabled qgroups. AFAIK qgroups were never strictly necessary in general, only for "space-aware cleanup" or displaying exclusive space used by snapshots.

Actions #7

Updated by okurz over 2 years ago

  • Priority changed from Normal to High
Actions #8

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
Actions #9

Updated by mkittler over 2 years ago

We've seen "quota not working" errors before but in that time the service didn't fail:

-- Logs begin at Fri 2020-11-20 15:34:44 CET, end at Fri 2021-11-26 14:56:52 CET. --
Nov 10 03:44:01 QA-Power8-4-kvm systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Nov 10 03:44:01 QA-Power8-4-kvm systemd-helper[34313]: running number cleanup for 'root'.
Nov 10 03:44:01 QA-Power8-4-kvm systemd-helper[34313]: quota not working (preparing quota failed)
Nov 10 03:44:02 QA-Power8-4-kvm systemd-helper[34313]: running timeline cleanup for 'root'.
Nov 10 03:44:02 QA-Power8-4-kvm systemd-helper[34313]: running empty-pre-post cleanup for 'root'.
Nov 11 03:44:42 QA-Power8-4-kvm systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Nov 11 03:44:42 QA-Power8-4-kvm systemd-helper[30133]: running number cleanup for 'root'.
Nov 11 03:44:42 QA-Power8-4-kvm systemd-helper[30133]: quota not working (preparing quota failed)
Nov 11 03:44:46 QA-Power8-4-kvm systemd-helper[30133]: running timeline cleanup for 'root'.
Nov 11 03:44:46 QA-Power8-4-kvm systemd-helper[30133]: running empty-pre-post cleanup for 'root'.
Nov 12 03:44:59 QA-Power8-4-kvm systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Nov 12 03:44:59 QA-Power8-4-kvm systemd-helper[42489]: running number cleanup for 'root'.
Nov 12 03:44:59 QA-Power8-4-kvm systemd-helper[42489]: quota not working (preparing quota failed)
Nov 12 03:45:00 QA-Power8-4-kvm systemd-helper[42489]: running timeline cleanup for 'root'.
Nov 12 03:45:00 QA-Power8-4-kvm systemd-helper[42489]: running empty-pre-post cleanup for 'root'.
Nov 13 03:44:59 QA-Power8-4-kvm systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Nov 13 03:44:59 QA-Power8-4-kvm systemd-helper[44071]: running number cleanup for 'root'.
Nov 13 03:45:00 QA-Power8-4-kvm systemd-helper[44071]: quota not working (preparing quota failed)
Nov 13 03:45:03 QA-Power8-4-kvm systemd-helper[44071]: running timeline cleanup for 'root'.
Nov 13 03:45:03 QA-Power8-4-kvm systemd-helper[44071]: running empty-pre-post cleanup for 'root'.

So Deleting snapshot failed. … number cleanup for 'root' failed is likely the real problem here. The problem occurred first when I tried to roll back to snapshot 1684. Maybe booting into the current snapshot fixes it again - although it is still questionable that we run into the error.

This is currently the only PowerPC worker which is running jobs so I'm refraining from rebooting experiments right now.

Actions #10

Updated by okurz over 2 years ago

  • Description updated (diff)

Updated description. Right now no openQA jobs are running on QA-Power8-4-kvm.qa so this would be a good opportunity for experiments :)

By the way:

# salt \* cmd.run "dmesg | grep 'qgroup is not enabled'"
storage.qa.suse.de:
openqaworker2.suse.de:
openqaworker8.suse.de:
openqaworker3.suse.de:
openqaworker5.suse.de:
openqaworker6.suse.de:
openqaworker9.suse.de:
backup.qa.suse.de:
malbec.arch.suse.de:
grenache-1.qa.suse.de:
openqa-monitor.qa.suse.de:
openqaworker10.suse.de:
openqaworker13.suse.de:
openqa.suse.de:
openqaworker-arm-1.suse.de:
openqaworker-arm-3.suse.de:
openqaworker-arm-2.suse.de:
openqaworker-arm-4.qa.suse.de:
openqaworker-arm-5.qa.suse.de:
$ for i in qa-power8-4-kvm.qa qa-power8-5-kvm.qa powerqaworker-qam-1.qa; do echo "# $i" && ssh $i "dmesg | grep 'qgroup is not enabled'"; done
# qa-power8-4-kvm.qa
[187914.914511] BTRFS warning (device sdb2): qgroup rescan init failed, qgroup is not enabled
[187969.987081] BTRFS warning (device sdb2): qgroup rescan init failed, qgroup is not enabled
[188052.592210] BTRFS warning (device sdb2): qgroup rescan init failed, qgroup is not enabled
# qa-power8-5-kvm.qa
# powerqaworker-qam-1.qa

so all other machines within OSD don't show the error "qgroup is not enabled" in their logs.

I called btrfs quota enable / on qa-power8-4-kvm.qa now. Shortly after dmesg showed qgroup scan completed (inconsistency flag cleared) so that seems to be clean. /usr/lib/snapper/systemd-helper --cleanup still fails though. Triggered a reboot although the machine automatically rebooted two days ago with no change in behaviour. After reboot I triggered snapper-cleanup again but this ended up with the same error. @mkittler why did you trigger a rollback? Maybe we can just delete all remaining snapshots and go from there?

Actions #11

Updated by mkittler over 2 years ago

I triggered a rollback to check whether recent updates are responsible for the slow networking (but the problem persisted after the rollback).

Maybe we can just delete all remaining snapshots and go from there?

Makes sense. After the latest automatic update the worker should be up-to-date now and we should be able to delete older snapshots.

Actions #12

Updated by mkittler over 2 years ago

I've been rolling back the first roll back. This doesn't change anything.

When running the cleanup manually it becomes clear that snapshot 1 is the problem:

martchus@QA-Power8-4-kvm:~> sudo snapper delete 1
Fehler beim Löschen des Schnappschusses.

This snapshot only became a cleanup candidate after the first rollback. Hence the problem was triggered by that.

Deleting the snapshot manually shows the same problem:

martchus@QA-Power8-4-kvm:~> sudo snapper delete 1
Fehler beim Löschen des Schnappschusses.

/var/log/snapper.log:

2021-11-30 13:29:05 MIL libsnapper(17135) Snapper.cc(Snapper):91 - Snapper constructor
2021-11-30 13:29:05 MIL libsnapper(17135) Snapper.cc(Snapper):92 - libsnapper version 0.8.9
2021-11-30 13:29:05 MIL libsnapper(17135) Snapper.cc(Snapper):93 - config_name:root disable_filters:false
2021-11-30 13:29:05 MIL libsnapper(17135) AsciiFile.cc(reload):114 - loading file /etc/snapper/configs/root
2021-11-30 13:29:05 MIL libsnapper(17135) AsciiFile.cc(getValue):243 - key:SUBVOLUME value:/
2021-11-30 13:29:05 MIL libsnapper(17135) AsciiFile.cc(getValue):243 - key:FSTYPE value:btrfs
2021-11-30 13:29:05 MIL libsnapper(17135) AsciiFile.cc(getValue):243 - key:QGROUP value:1/0
2021-11-30 13:29:05 MIL libsnapper(17135) AsciiFile.cc(getValue):243 - key:SYNC_ACL value:no
2021-11-30 13:29:05 MIL libsnapper(17135) Snapper.cc(Snapper):125 - subvolume:/ filesystem:btrfs
2021-11-30 13:29:05 MIL libsnapper(17135) Snapper.cc(loadIgnorePatterns):177 - number of ignore patterns:8
2021-11-30 13:29:05 MIL libsnapper(17135) Snapshot.cc(read):272 - found 34 snapshots
2021-11-30 13:29:05 ERR libsnapper(17135) Btrfs.cc(deleteSnapshot):441 - delete snapshot failed, ioctl(BTRFS_IOC_SNAP_DESTROY) failed, errno:39 (Directory not empty)
2021-11-30 13:29:05 WAR libsnapper(17135) Btrfs.cc(deleteSnapshot):442 - THROW: delete snapshot failed
2021-11-30 13:29:05 WAR libsnapper(17135) Client.cc(dispatch):1800 - CAUGHT: delete snapshot failed
2021-11-30 13:29:35 MIL libsnapper(17135) Snapper.cc(~Snapper):136 - Snapper destructor

We're not the only ones running into this problem: https://www.opensuse-forum.de/thread/64330-snapper-cleanup-nach-rollback-nicht-mehr-m%C3%B6glich/

Actions #13

Updated by mkittler over 2 years ago

  • Status changed from Workable to Feedback

Here we have our problem:

martchus@QA-Power8-4-kvm:~> sudo btrfs subvolume list /
ID 257 gen 1439747 top level 5 path @
ID 258 gen 2747214 top level 257 path @/.snapshots
ID 260 gen 2747115 top level 258 path @/.snapshots/1/snapshot
ID 261 gen 2747233 top level 257 path @/home
ID 262 gen 2323223 top level 257 path @/opt
ID 263 gen 2743251 top level 257 path @/srv
ID 264 gen 2747252 top level 257 path @/tmp
ID 265 gen 2746590 top level 257 path @/usr/local
ID 266 gen 2747214 top level 257 path @/var/cache
ID 267 gen 2323223 top level 257 path @/var/crash
ID 268 gen 2743251 top level 257 path @/var/lib/libvirt/images
ID 269 gen 1439747 top level 257 path @/var/lib/machines
ID 270 gen 1439747 top level 257 path @/var/lib/mailman
ID 271 gen 1439747 top level 257 path @/var/lib/mariadb
ID 272 gen 1439747 top level 257 path @/var/lib/mysql
ID 273 gen 1439747 top level 257 path @/var/lib/named
ID 274 gen 1439747 top level 257 path @/var/lib/pgsql
ID 275 gen 2747253 top level 257 path @/var/log
ID 276 gen 2323223 top level 257 path @/var/opt
ID 277 gen 2747252 top level 257 path @/var/spool
ID 278 gen 2747214 top level 257 path @/var/tmp
ID 6385 gen 2323223 top level 260 path @/.snapshots/1/snapshot/var/lib/containers/storage/btrfs/subvolumes/74a1b83ca1371db656a8da867eed17fdc8209b3722d4525ce14ae9141fdd3b64
ID 6386 gen 2323223 top level 260 path @/.snapshots/1/snapshot/var/lib/containers/storage/btrfs/subvolumes/d731aaba630113706541c3dd24fdcbf4abe18dc9a0fa22528ef5cdcea99ec226
ID 6393 gen 2323223 top level 260 path @/.snapshots/1/snapshot/var/lib/docker/btrfs/subvolumes/df9aa251422d7311b6f1ddf8400b816552080e340b2a4a7d178178e8ef19eb09
ID 6398 gen 1439747 top level 260 path @/.snapshots/1/snapshot/var/lib/docker/btrfs/subvolumes/5f584c2738f156657522cb1085cc7d19bf91405e555f4aef577fafc83777a707
ID 9646 gen 2743251 top level 258 path @/.snapshots/1651/snapshot
ID 9647 gen 2743251 top level 258 path @/.snapshots/1652/snapshot

Snapshot 1 contains nested btrfs subvolumes and therefore snapper refuses to delete it. I have now deleted them manually and the cleanup works again:

martchus@QA-Power8-4-kvm:~> sudo btrfs subvolume delete @/.snapshots/1/snapshot/var/lib/containers/storage/btrfs/subvolumes/74a1b83ca1371db656a8da867eed17fdc8209b3722d4525ce14ae9141fdd3b64
ERROR: Could not statfs: No such file or directory
martchus@QA-Power8-4-kvm:~> sudo btrfs subvolume delete /.snapshots/1/snapshot/var/lib/containers/storage/btrfs/subvolumes/74a1b83ca1371db656a8da867eed17fdc8209b3722d4525ce14ae9141fdd3b64
Delete subvolume (no-commit): '/.snapshots/1/snapshot/var/lib/containers/storage/btrfs/subvolumes/74a1b83ca1371db656a8da867eed17fdc8209b3722d4525ce14ae9141fdd3b64'
martchus@QA-Power8-4-kvm:~> sudo btrfs subvolume delete /.snapshots/1/snapshot/var/lib/containers/storage/btrfs/subvolumes/d731aaba630113706541c3dd24fdcbf4abe18dc9a0fa22528ef5cdcea99ec226
Delete subvolume (no-commit): '/.snapshots/1/snapshot/var/lib/containers/storage/btrfs/subvolumes/d731aaba630113706541c3dd24fdcbf4abe18dc9a0fa22528ef5cdcea99ec226'
martchus@QA-Power8-4-kvm:~> sudo btrfs subvolume delete /.snapshots/1/snapshot/var/lib/docker/btrfs/subvolumes/df9aa251422d7311b6f1ddf8400b816552080e340b2a4a7d178178e8ef19eb09
Delete subvolume (no-commit): '/.snapshots/1/snapshot/var/lib/docker/btrfs/subvolumes/df9aa251422d7311b6f1ddf8400b816552080e340b2a4a7d178178e8ef19eb09'
martchus@QA-Power8-4-kvm:~> sudo btrfs subvolume delete /.snapshots/1/snapshot/var/lib/docker/btrfs/subvolumes/5f584c2738f156657522cb1085cc7d19bf91405e555f4aef577fafc83777a707
Delete subvolume (no-commit): '/.snapshots/1/snapshot/var/lib/docker/btrfs/subvolumes/5f584c2738f156657522cb1085cc7d19bf91405e555f4aef577fafc83777a707'
martchus@QA-Power8-4-kvm:~> sudo snapper delete 1
martchus@QA-Power8-4-kvm:~> sudo systemctl restart snapper-cleanup.service 
martchus@QA-Power8-4-kvm:~> sudo systemctl status snapper-cleanup.service
martchus@QA-Power8-4-kvm:~> sudo systemctl status snapper-cleanup.service
● snapper-cleanup.service - Daily Cleanup of Snapper Snapshots
   Loaded: loaded (/usr/lib/systemd/system/snapper-cleanup.service; static; vendor preset: disabled)
   Active: inactive (dead) since Tue 2021-11-30 14:41:00 CET; 45s ago
     Docs: man:snapper(8)
           man:snapper-configs(5)
  Process: 19541 ExecStart=/usr/lib/snapper/systemd-helper --cleanup (code=exited, status=0/SUCCESS)
 Main PID: 19541 (code=exited, status=0/SUCCESS)

Nov 30 14:40:35 QA-Power8-4-kvm systemd[1]: Started Daily Cleanup of Snapper Snapshots.
Nov 30 14:40:35 QA-Power8-4-kvm systemd-helper[19541]: running number cleanup for 'root'.
Nov 30 14:40:40 QA-Power8-4-kvm systemd-helper[19541]: quota not working (preparing quota failed)
Nov 30 14:41:00 QA-Power8-4-kvm systemd-helper[19541]: running timeline cleanup for 'root'.
Nov 30 14:41:00 QA-Power8-4-kvm systemd-helper[19541]: running empty-pre-post cleanup for 'root'.

The subvolumes I've deleted are nowhere mentioned in /etc/fstab. They're likely related to docker which is installed but not enabled on the machine. So I suppose deleting the subvolumes should be ok.

I've enabled the systemd services alert again.

Actions #14

Updated by okurz over 2 years ago

cool. I followed that forums post and https://github.com/openSUSE/snapper/issues/159 is a still open upstream bug report. I commented on https://github.com/openSUSE/snapper/issues/159#issuecomment-982729393 so I think we are done here. Wanna resolve?

Actions #15

Updated by mkittler over 2 years ago

  • Status changed from Feedback to Resolved

Yes. I must say that it is actually good that snapper doesn't delete arbitrary subvolumes (maybe we'd actually needed them). The error message visible from the logs (delete snapshot failed, ioctl(BTRFS_IOC_SNAP_DESTROY) failed, errno:39 (Directory not empty)) is not the best but I doubt snapper can improved here (as it likely just prints the error message from the ioctl call). So yes, I'd just resolve the ticket.

Actions #16

Updated by okurz 10 months ago

  • Related to action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M added
Actions

Also available in: Atom PDF