Project

General

Profile

action #155848

Updated by mkittler 10 months ago

## Observation 

 ``` 
 martchus@worker29:~> sudo journalctl -u firewalld.service 
 Feb 18 03:31:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon... 
 Feb 18 03:31:59 worker29 systemd[1]: firewalld.service: Deactivated successfully. 
 Feb 18 03:31:59 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon. 
 -- Boot 8c90f12e00d94891941a5b00e8d1124a -- 
 Feb 18 03:34:44 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon... 
 Feb 18 03:34:45 worker29 systemd[1]: Started firewalld - dynamic firewall daemon. 
 Feb 18 03:34:50 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 03:34:51 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 03:34:52 worker29 firewalld[2185]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 … 
 Feb 18 05:40:05 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 05:40:06 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 05:40:07 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 05:40:08 worker29 firewalld[96768]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:38:10 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon... 
 Feb 18 06:38:12 worker29 systemd[1]: firewalld.service: Deactivated successfully. 
 Feb 18 06:38:12 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon. 
 Feb 18 06:38:12 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon... 
 Feb 18 06:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon. 
 Feb 18 06:38:21 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:38:22 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:38:23 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:38:24 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 … 
 Feb 18 06:40:05 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:40:06 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:40:07 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 06:40:08 worker29 firewalld[108896]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 18 07:38:11 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon... 
 Feb 18 07:38:13 worker29 systemd[1]: firewalld.service: Deactivated successfully. 
 Feb 18 07:38:13 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon. 
 Feb 18 07:38:13 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon... 
 Feb 18 07:38:13 worker29 systemd[1]: Started firewalld - dynamic firewall daemon. 
 Feb 21 13:39:57 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon... 
 Feb 21 13:39:58 worker29 systemd[1]: firewalld.service: Deactivated successfully. 
 Feb 21 13:39:58 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon. 
 -- Boot 1ca309edcd134e5195355a0904a6a196 -- 
 Feb 21 13:43:20 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon... 
 Feb 21 13:43:20 worker29 systemd[1]: Started firewalld - dynamic firewall daemon. 
 Feb 21 13:43:25 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 21 13:43:26 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 21 13:43:27 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 … 
 Feb 21 13:45:12 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 21 13:45:13 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 21 13:45:14 worker29 firewalld[2198]: ERROR: Calling pre func <bound method Firewall.full_check_config of <class 'firewall.core.fw.Firewall'>(True, True, True, 'RUNNING', False, 'trusted', {}, [], True, True, True, False, 'off')>(()) failed: INVALID_ZONE: 'libvirt-routed' not among existing zones 
 Feb 21 14:46:45 worker29 systemd[1]: Stopping firewalld - dynamic firewall daemon... 
 Feb 21 14:46:47 worker29 systemd[1]: firewalld.service: Deactivated successfully. 
 Feb 21 14:46:47 worker29 systemd[1]: Stopped firewalld - dynamic firewall daemon. 
 Feb 21 14:46:47 worker29 systemd[1]: Starting firewalld - dynamic firewall daemon... 
 Feb 21 14:46:48 worker29 systemd[1]: Started firewalld - dynamic firewall daemon. 
 ``` 

 We have also seen MM failures in the timeframe of the most recent `Stopping/Starting`-lines in the log above, see #155716#note-8. See also #155716#note-9 for my initial investigation. 

 ## Acceptance criteria 
 * **AC1**: We know why firewalld is repeatedly restarted and whether restarting is related to the error messages. 
 * DONE: **AC2**: We know the meaning and impact of the error message. 
     * The error is caused by this bug: https://bugzilla.opensuse.org/show_bug.cgi?id=1214160 
     * I don't think this error is problematic for us as we don't use libvirt that way. 
 * DONE: **AC3**: We know whether it also happens on other workers. (Does it also happen on other OSD workers? Does it also happen on o3 workers?) 
     * The mentioned firewalld error is occurring on other OSD and o3 workers as well (on o3 only on workers openqaworker23 and openqaworker-arm21). I have also seen the firewalld service restarting at some point in the middle on other OSD workers. 
 * **AC4**: The error is prevented or worked around. 
 * DONE: **AC5**: We know why the MM failures were happening (*possibly* due to these problems with firewalld but it could also be a red herring). 
     * As discussed, those failures are unlikely to happen because of the firewalld issues. Maybe enabling rstp helps, see #155929. 
 * DONE: **AC6**: The MM failures are prevented if caused by a concrete issue. 
     * We created #155929 as follow-up for the next best thing to try to improve the MM setup. 


 ## Suggestions 
 * Ensure that the error is in an upstream report, e.g. bugzilla and/or further upstream 
 * Do what we can do to prevent the error 

 ## Out of scope 
 * Multi-machine config rework or anything about STP (see new ticket about that)

Back