action #155848
closedFirewalld is logging many errors and sometimes restarting on worker29, possibly related to MM failures size:M
0%
Description
Observation¶
martchus@worker29:~> sudo journalctl -u firewalld.service
Feb 18 03:31:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 03:31:59 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 03:31:59 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
-- Boot 8c90f12e00d94891941a5b00e8d1124a --
Feb 18 03:34:44 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 03:34:45 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 18 03:34:50 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:51 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 18 05:40:05 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 05:40:08 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:10 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 06:38:12 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 06:38:12 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 18 06:38:12 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 06:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 18 06:38:21 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:22 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:38:24 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 18 06:40:05 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:07 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 06:40:08 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 18 07:38:11 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 18 07:38:13 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 18 07:38:13 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 18 07:38:13 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 18 07:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 21 13:39:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 21 13:39:58 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 21 13:39:58 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
-- Boot 1ca309edcd134e5195355a0904a6a196 --
Feb 21 13:43:20 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 21 13:43:20 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
Feb 21 13:43:25 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:43:27 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
…
Feb 21 13:45:12 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:45:13 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 13:45:14 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones
Feb 21 14:46:45 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon...
Feb 21 14:46:47 worker29 systemd[1]: firewalld.service: Deactivated successfully.
Feb 21 14:46:47 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon.
Feb 21 14:46:47 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon...
Feb 21 14:46:48 worker29 systemd[1]: Started firewalld - dynamic firewall daemon.
We have also seen MM failures in the timeframe of the most recent Stopping/Starting
-lines in the log above, see #155716#note-8. See also #155716#note-9 for my initial investigation.
Acceptance criteria¶
- REJECTED: AC1: We know why firewalld is repeatedly restarted and whether restarting is related to the error messages.
- I'm still not sure why it happens.
- I'm still not sure whether it has an impact. I stopped it temporarily on one worker and settings like ip forwarding were still in place. However, settings on nft-level like masquerading were gone - at least
sudo nft list ruleset
showed an empty response instead of rules likechain nat_POST_trusted_allow { oifname != "lo" masquerade }
. So I guess firewalld being restarted might interfere with running MM tests. - We decided to not further investigate because the impact is likely low.
- DONE: AC2: We know the meaning and impact of the error message.
- The error is caused by this bug: https://bugzilla.opensuse.org/show_bug.cgi?id=1214160
- I don't think this error is problematic for us as we don't use libvirt that way.
- DONE: AC3: We know whether it also happens on other workers. (Does it also happen on other OSD workers? Does it also happen on o3 workers?)
- The mentioned firewalld error is occurring on other OSD and o3 workers as well (on o3 only on workers openqaworker23 and openqaworker-arm21). I have also seen the firewalld service restarting at some point in the middle on other OSD workers.
- DONE: AC4: The error is prevented or worked around.
- We cannot just uninstall the broken package (see #155848#note-9) so I guess it is best we ignore this error for now.
- DONE: AC5: We know why the MM failures were happening (possibly due to these problems with firewalld but it could also be a red herring).
- As discussed, those failures are unlikely to happen because of the firewalld issues. Maybe enabling rstp helps, see #155929.
- DONE: AC6: The MM failures are prevented if caused by a concrete issue.
- We created #155929 as follow-up for the next best thing to try to improve the MM setup.
Suggestions¶
- Ensure that the error is in an upstream report, e.g. bugzilla and/or further upstream
- Do what we can do to prevent the error
Out of scope¶
- Multi-machine config rework or anything about STP (see new ticket about that)
Updated by mkittler 10 months ago
This comes from:
/usr/lib/firewalld/policies/libvirt-routed-in.xml: <short>libvirt-routed-in</short>
/usr/lib/firewalld/policies/libvirt-routed-in.xml: <egress-zone name="libvirt-routed" />
/usr/lib/firewalld/policies/libvirt-routed-out.xml: <short>libvirt-routed-out</short>
/usr/lib/firewalld/policies/libvirt-routed-out.xml: <ingress-zone name="libvirt-routed" />
/usr/lib/firewalld/policies/libvirt-to-host.xml: <ingress-zone name="libvirt-routed" />
/usr/lib/firewalld/zones/libvirt-routed.xml: <short>libvirt-routed</short>
Which in turn comes from libvirt-daemon-driver-network
. The last two changes are months in the past so this isn't something new:
martchus@worker40:~> rpm --query --changelog libvirt-daemon-driver-network
* Tue Jul 25 2023 jfehlig@suse.com
- spec: Build library with support for modular daemons
bsc#1213352
* Thu Jul 20 2023 jfehlig@suse.com
- CVE-2023-3750: storage: Fix returning of locked objects from
'virStoragePoolObjListSearch'
bsc#1213447
Updated by mkittler 10 months ago
- Description updated (diff)
- Status changed from Workable to Feedback
I found a bug report about this after all (see updated description).
We could just uninstall libvirt-daemon-driver-network
to get rid of the error on our worker hosts. That would also uninstall libvirt libvirt-daemon-config-network libvirt-daemon-driver-network vagrant-libvirt
but I guess it would be ok considering we are not actually using libvirtd on worker hosts (only on jump hosts via svirt).
We actually need vagrant-libvirt
considering the salt-states commit 038b7a7ed0861472795da543c27b2d53c478a29e. So I'm not sure how to prevent this error. Maybe - now that we are aware of its source/impact - we can also just ignore it.