Project

General

Profile

Actions

action #155848

closed

Firewalld is logging many errors and sometimes restarting on worker29, possibly related to MM failures size:M

Added by mkittler 2 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2024-02-22
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

martchus@worker29:~> sudo journalctl -u firewalld.service
Feb 18 03:31:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 03:31:59 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 03:31:59 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
-- Boot 8c90f12e00d94891941a5b00e8d1124a --
Feb 18 03:34:44 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 03:34:45 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 18 03:34:50 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:51 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 18 05:40:05 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:08 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:10 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 06:38:12 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 06:38:12 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 18 06:38:12 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 06:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 18 06:38:21 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:22 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:24 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 18 06:40:05 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:07 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:08 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 07:38:11 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 07:38:13 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 07:38:13 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 18 07:38:13 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 07:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 21 13:39:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 21 13:39:58 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 21 13:39:58 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
-- Boot 1ca309edcd134e5195355a0904a6a196 --
Feb 21 13:43:20 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 21 13:43:20 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 21 13:43:25 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:27 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 21 13:45:12 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:45:13 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:45:14 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 14:46:45 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 21 14:46:47 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 21 14:46:47 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 21 14:46:47 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 21 14:46:48 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.

We have also seen MM failures in the timeframe of the most recent Stopping/Starting-lines in the log above, see #155716#note-8. See also #155716#note-9 for my initial investigation.

Acceptance criteria

  • REJECTED: AC1: We know why firewalld is repeatedly restarted and whether restarting is related to the error messages.
    • I'm still not sure why it happens.
    • I'm still not sure whether it has an impact. I stopped it temporarily on one worker and settings like ip forwarding were still in place. However, settings on nft-level like masquerading were gone - at least sudo nft list ruleset showed an empty response instead of rules like chain nat_POST_trusted_allow { oifname != "lo" masquerade }. So I guess firewalld being restarted might interfere with running MM tests.
    • We decided to not further investigate because the impact is likely low.
  • DONE: AC2: We know the meaning and impact of the error message.
  • DONE: AC3: We know whether it also happens on other workers. (Does it also happen on other OSD workers? Does it also happen on o3 workers?)
    • The mentioned firewalld error is occurring on other OSD and o3 workers as well (on o3 only on workers openqaworker23 and openqaworker-arm21). I have also seen the firewalld service restarting at some point in the middle on other OSD workers.
  • DONE: AC4: The error is prevented or worked around.
    • We cannot just uninstall the broken package (see #155848#note-9) so I guess it is best we ignore this error for now.
  • DONE: AC5: We know why the MM failures were happening (possibly due to these problems with firewalld but it could also be a red herring).
    • As discussed, those failures are unlikely to happen because of the firewalld issues. Maybe enabling rstp helps, see #155929.
  • DONE: AC6: The MM failures are prevented if caused by a concrete issue.
    • We created #155929 as follow-up for the next best thing to try to improve the MM setup.

Suggestions

  • Ensure that the error is in an upstream report, e.g. bugzilla and/or further upstream
  • Do what we can do to prevent the error

Out of scope

  • Multi-machine config rework or anything about STP (see new ticket about that)
Actions #1

Updated by mkittler 2 months ago

  • Description updated (diff)
Actions #2

Updated by mkittler 2 months ago

  • Assignee set to mkittler
Actions #3

Updated by mkittler 2 months ago

  • Description updated (diff)
Actions #4

Updated by mkittler 2 months ago

This comes from:

/usr/lib/firewalld/policies/libvirt-routed-in.xml:  <short>libvirt-routed-in</short>
/usr/lib/firewalld/policies/libvirt-routed-in.xml:  <egress-zone name="libvirt-routed" />
/usr/lib/firewalld/policies/libvirt-routed-out.xml:  <short>libvirt-routed-out</short>
/usr/lib/firewalld/policies/libvirt-routed-out.xml:  <ingress-zone name="libvirt-routed" />
/usr/lib/firewalld/policies/libvirt-to-host.xml:  <ingress-zone name="libvirt-routed" />
/usr/lib/firewalld/zones/libvirt-routed.xml:  <short>libvirt-routed</short>

Which in turn comes from libvirt-daemon-driver-network. The last two changes are months in the past so this isn't something new:

martchus@worker40:~> rpm --query --changelog libvirt-daemon-driver-network
* Tue Jul 25 2023 jfehlig@suse.com
- spec: Build library with support for modular daemons
  bsc#1213352

* Thu Jul 20 2023 jfehlig@suse.com
- CVE-2023-3750: storage: Fix returning of locked objects from
  'virStoragePoolObjListSearch'
  bsc#1213447
Actions #5

Updated by okurz 2 months ago

but 2023-07 is also not that old. So could it be that some of our problems regarding multi-machine tests started around that time due to the change in libvirt-daemon-driver-network?

Actions #7

Updated by okurz 2 months ago

  • Subject changed from Firewalld is logging many errors and sometimes restarting on worker29, possibly related to MM failures to Firewalld is logging many errors and sometimes restarting on worker29, possibly related to MM failures size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #8

Updated by mkittler 2 months ago

  • Description updated (diff)
Actions #9

Updated by mkittler 2 months ago

  • Description updated (diff)
  • Status changed from Workable to Feedback

I found a bug report about this after all (see updated description).

We could just uninstall libvirt-daemon-driver-network to get rid of the error on our worker hosts. That would also uninstall libvirt libvirt-daemon-config-network libvirt-daemon-driver-network vagrant-libvirt but I guess it would be ok considering we are not actually using libvirtd on worker hosts (only on jump hosts via svirt).
We actually need vagrant-libvirt considering the salt-states commit 038b7a7ed0861472795da543c27b2d53c478a29e. So I'm not sure how to prevent this error. Maybe - now that we are aware of its source/impact - we can also just ignore it.

Actions #10

Updated by mkittler 2 months ago

  • Description updated (diff)
Actions #11

Updated by mkittler 2 months ago

  • Description updated (diff)
  • Status changed from Feedback to Resolved

Resolving, see updated description.

Actions

Also available in: Atom PDF