Project

General

Custom queries

Profile

Actions

action #155848

closed

Firewalld is logging many errors and sometimes restarting on worker29, possibly related to MM failures size:M

Added by mkittler 10 months ago. Updated 10 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2024-02-22
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

martchus@worker29:~> sudo journalctl -u firewalld.service
Feb 18 03:31:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 03:31:59 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 03:31:59 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
-- Boot 8c90f12e00d94891941a5b00e8d1124a --
Feb 18 03:34:44 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 03:34:45 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 18 03:34:50 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:51 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 18 05:40:05 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:08 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:10 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 06:38:12 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 06:38:12 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 18 06:38:12 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 06:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 18 06:38:21 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:22 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:24 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 18 06:40:05 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:07 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:08 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 07:38:11 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 07:38:13 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 07:38:13 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 18 07:38:13 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 07:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 21 13:39:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 21 13:39:58 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 21 13:39:58 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
-- Boot 1ca309edcd134e5195355a0904a6a196 --
Feb 21 13:43:20 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 21 13:43:20 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 21 13:43:25 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:27 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 21 13:45:12 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:45:13 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:45:14 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 14:46:45 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 21 14:46:47 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 21 14:46:47 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 21 14:46:47 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 21 14:46:48 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.

We have also seen MM failures in the timeframe of the most recent Stopping/Starting-lines in the log above, see #155716#note-8. See also #155716#note-9 for my initial investigation.

Acceptance criteria

  • REJECTED: AC1: We know why firewalld is repeatedly restarted and whether restarting is related to the error messages.
    • I'm still not sure why it happens.
    • I'm still not sure whether it has an impact. I stopped it temporarily on one worker and settings like ip forwarding were still in place. However, settings on nft-level like masquerading were gone - at least sudo nft list ruleset showed an empty response instead of rules like chain nat_POST_trusted_allow { oifname != "lo" masquerade }. So I guess firewalld being restarted might interfere with running MM tests.
    • We decided to not further investigate because the impact is likely low.
  • DONE: AC2: We know the meaning and impact of the error message.
  • DONE: AC3: We know whether it also happens on other workers. (Does it also happen on other OSD workers? Does it also happen on o3 workers?)
    • The mentioned firewalld error is occurring on other OSD and o3 workers as well (on o3 only on workers openqaworker23 and openqaworker-arm21). I have also seen the firewalld service restarting at some point in the middle on other OSD workers.
  • DONE: AC4: The error is prevented or worked around.
    • We cannot just uninstall the broken package (see #155848#note-9) so I guess it is best we ignore this error for now.
  • DONE: AC5: We know why the MM failures were happening (possibly due to these problems with firewalld but it could also be a red herring).
    • As discussed, those failures are unlikely to happen because of the firewalld issues. Maybe enabling rstp helps, see #155929.
  • DONE: AC6: The MM failures are prevented if caused by a concrete issue.
    • We created #155929 as follow-up for the next best thing to try to improve the MM setup.

Suggestions

  • Ensure that the error is in an upstream report, e.g. bugzilla and/or further upstream
  • Do what we can do to prevent the error

Out of scope

  • Multi-machine config rework or anything about STP (see new ticket about that)
Actions #4

Updated by mkittler 10 months ago

This comes from:

/usr/lib/firewalld/policies/libvirt-routed-in.xml:  <short>libvirt-routed-in</short>
/usr/lib/firewalld/policies/libvirt-routed-in.xml:  <egress-zone name="libvirt-routed" />
/usr/lib/firewalld/policies/libvirt-routed-out.xml:  <short>libvirt-routed-out</short>
/usr/lib/firewalld/policies/libvirt-routed-out.xml:  <ingress-zone name="libvirt-routed" />
/usr/lib/firewalld/policies/libvirt-to-host.xml:  <ingress-zone name="libvirt-routed" />
/usr/lib/firewalld/zones/libvirt-routed.xml:  <short>libvirt-routed</short>

Which in turn comes from libvirt-daemon-driver-network. The last two changes are months in the past so this isn't something new:

martchus@worker40:~> rpm --query --changelog libvirt-daemon-driver-network
* Tue Jul 25 2023 jfehlig@suse.com
- spec: Build library with support for modular daemons
  bsc#1213352

* Thu Jul 20 2023 jfehlig@suse.com
- CVE-2023-3750: storage: Fix returning of locked objects from
  'virStoragePoolObjListSearch'
  bsc#1213447
Actions #5

Updated by okurz 10 months ago

but 2023-07 is also not that old. So could it be that some of our problems regarding multi-machine tests started around that time due to the change in libvirt-daemon-driver-network?

Actions #9

Updated by mkittler 10 months ago

  • Description updated (diff)
  • Status changed from Workable to Feedback

I found a bug report about this after all (see updated description).

We could just uninstall libvirt-daemon-driver-network to get rid of the error on our worker hosts. That would also uninstall libvirt libvirt-daemon-config-network libvirt-daemon-driver-network vagrant-libvirt but I guess it would be ok considering we are not actually using libvirtd on worker hosts (only on jump hosts via svirt).
We actually need vagrant-libvirt considering the salt-states commit 038b7a7ed0861472795da543c27b2d53c478a29e. So I'm not sure how to prevent this error. Maybe - now that we are aware of its source/impact - we can also just ignore it.

Actions #11

Updated by mkittler 10 months ago

  • Description updated (diff)
  • Status changed from Feedback to Resolved

Resolving, see updated description.

Actions

Also available in: Atom PDF