action #88299
closed[virtualization] Worker openqaw5-xen-1.qa.suse.de is not reachable (xen-hvm/xen-pv failing)
0%
Description
The following OSD job failed due to the worker openqaw5-xen-1.qa.suse.de is not reachable and cannot be booted up.
sle-15-SP3-Online-x86_64-Build133.1-default_install_svirt@svirt-hyperv2012r2-uefi (https://openqa.nue.suse.com/tests/5357316#)
Host: openqaw5-xen.qa.suse.de
Guest VM: openqaw5-xen-1.qa.suse.de
Updated by okurz almost 4 years ago
- Subject changed from Worker openqaw5-xen-1.qa.suse.de is not reachable to [virtualization] Worker openqaw5-xen-1.qa.suse.de is not reachable
- Target version set to future
Updated by szarate almost 4 years ago
- Related to action #88217: [qe-core] test fails in bootloader_svirt - libxenlight failed to create new domain: leftover qemu process added
Updated by tjyrinki_suse almost 4 years ago
- Subject changed from [virtualization] Worker openqaw5-xen-1.qa.suse.de is not reachable to [virtualization] Worker openqaw5-xen-1.qa.suse.de is not reachable (xen-hvm/xen-pv failing)
Updated by mloviska almost 4 years ago
- Status changed from New to In Progress
openqaw5-xen.qa.suse.de
has been successfully migrated to sle15sp2.
# cat /etc/os-release
NAME="SLES"
VERSION="15-SP2"
VERSION_ID="15.2"
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP2"
ID="sles"
ID_LIKE="suse"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:suse:sles:15:sp2"
Unfortunately, there seems to be still a problem in libvirtd
openqaw5-xen:~ # systemctl status libvirtd
● libvirtd.service - Virtualization daemon
Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-01-28 19:08:13 CET; 15h ago
Docs: man:libvirtd(8)
https://libvirt.org
Main PID: 2715 (libvirtd)
Tasks: 28 (limit: 32768)
CGroup: /system.slice/libvirtd.service
├─2715 /usr/sbin/libvirtd --timeout 120
└─6205 /usr/bin/qemu-system-x86_64 -xen-domid 2 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-2,server,nowait -no-shutdown -mon chardev=libxl-cmd,mode=control -chardev >
Jan 28 19:55:47 openqaw5-xen root[28322]: /etc/xen/scripts/vif-bridge: ip link set vif9.0 nomaster failed
Jan 28 19:55:47 openqaw5-xen root[28326]: /etc/xen/scripts/vif-bridge: ip link set vif9.0 down failed
Jan 28 19:55:47 openqaw5-xen root[28327]: /etc/xen/scripts/vif-bridge: Successful vif-bridge offline for vif9.0, bridge br0.
Jan 28 19:55:49 openqaw5-xen libvirtd[2715]: 2731: error : virDomainSnapshotNum:344 : this function is not supported by the connection driver: virDomainSnapshotNum
Jan 28 23:02:08 openqaw5-xen libvirtd[2715]: 2730: warning : libxlDomainObjBeginJob:146 : Cannot start job (modify) for domain openQA-SUT-1; current job is (modify) owned by (2732)
Jan 28 23:02:08 openqaw5-xen libvirtd[2715]: 2730: error : libxlDomainObjBeginJob:150 : Timed out during operation: cannot acquire state change lock
Jan 28 23:02:08 openqaw5-xen libvirtd[2715]: 2715: error : virNetSocketReadWire:1832 : End of file while reading data: Input/output error
Jan 28 23:02:09 openqaw5-xen libvirtd[2715]: 2729: warning : libxlDomainObjBeginJob:146 : Cannot start job (modify) for domain openQA-SUT-3; current job is (modify) owned by (2733)
Jan 28 23:02:09 openqaw5-xen libvirtd[2715]: 2729: error : libxlDomainObjBeginJob:150 : Timed out during operation: cannot acquire state change lock
Jan 28 23:02:09 openqaw5-xen libvirtd[2715]: 2715: error : virNetSocketReadWire:1817 : Cannot recv data: Connection reset by peer
openqaw5-xen:~ # systemctl restart libvirtd
openqaw5-xen:~ # systemctl status libvirtd
● libvirtd.service - Virtualization daemon
Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2021-01-29 10:22:30 CET; 8s ago
Docs: man:libvirtd(8)
https://libvirt.org
Main PID: 25246 (libvirtd)
Tasks: 29 (limit: 32768)
CGroup: /system.slice/libvirtd.service
├─ 6205 /usr/bin/qemu-system-x86_64 -xen-domid 2 -chardev socket,id=libxl-cmd,path=/var/run/xen/qmp-libxl-2,server,nowait -no-shutdown -mon chardev=libxl-cmd,mode=control -chardev>
└─25246 /usr/sbin/libvirtd --timeout 120
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 9
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 10
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 11
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 12
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 13
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 14
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 15
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 17
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 18
Jan 29 10:22:30 openqaw5-xen libvirtd[25246]: 2021-01-29 09:22:30.630+0000: 25266: debug : virFileClose:110 : Closed fd 20
As of now, I am not really sure what is the root-cause, and it is still under investigation. Nevertheless, it seems to affect mostly xen
jobs, hyperv (RDP-VNC wrapper VM seems to work) or vmware should not be affected.
Updated by mloviska almost 4 years ago
I had to install and configure xen to use openqswitch instead of brigde-utils as brigde-utils become deprecated and part of legacy module.
openqaw5-xen:~ # xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2268 32 r----- 872.1
Xenstore 1 31 1 -b---- 0.9
openQA_hyperv_intermediary 2 4088 2 -b---- 85.6
openQA-SUT-1 4 1016 1 -b---- 108.6
openQA-SUT-2 5 1016 1 -b---- 60.5
openqaw5-xen:~ # virsh list
Id Name State
--------------------------------------------
0 Domain-0 running
2 openQA_hyperv_intermediary running
4 openQA-SUT-1 running
5 openQA-SUT-2 running
openqaw5-xen:~ #
Temporary I have started for sure more xen related service than I should, however it is not clear to me which are necessary. To be clarified later.
Updated by nanzhang almost 4 years ago
Thanks mloviska. I've re-run the job, and the issue has gone.
https://openqa.nue.suse.com/tests/5386318
Updated by mloviska almost 4 years ago
- Related to action #88373: [xen-post-upgrade][qac-infra][investigation] post configuration leftovers added
Updated by mloviska almost 4 years ago
https://openqa.suse.de/tests/5399205/file/serial0.txt
After triggering kernel crash, it seems like the brigde settings in xen aren't restored.
Locally I can see error msg: error: Disconnected from xen:///system due to end of file
Checking libxl logs and ip settings I can see following
2021-02-03 14:36:44.208+0000: libxl: libxl_event.c:676:libxl__ev_xswatch_deregister: watch w=0x7f61e0014e20 wpath=/local/domain/0/backend/vif/623/0/state token=2/6: deregister slotnum=2
2021-02-03 14:36:44.208+0000: libxl: libxl_device.c:1086:device_backend_callback: Domain 623:calling device_backend_cleanup
2021-02-03 14:36:44.208+0000: libxl: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x7f61e0014e20: deregister unregistered
2021-02-03 14:36:44.210+0000: libxl: libxl_device.c:1187:device_hotplug: Domain 623:calling hotplug script: /etc/xen/scripts/vif-bridge online
2021-02-03 14:36:44.210+0000: libxl: libxl_device.c:1188:device_hotplug: Domain 623:extra args:
2021-02-03 14:36:44.210+0000: libxl: libxl_device.c:1194:device_hotplug: Domain 623: type_if=vif
2021-02-03 14:36:44.210+0000: libxl: libxl_device.c:1196:device_hotplug: Domain 623:env:
2021-02-03 14:36:44.210+0000: libxl: libxl_device.c:1203:device_hotplug: Domain 623: script: /etc/xen/scripts/vif-bridge
2021-02-03 14:36:44.210+0000: libxl: libxl_device.c:1203:device_hotplug: Domain 623: XENBUS_TYPE: vif
2021-02-03 14:36:44.211+0000: libxl: libxl_device.c:1203:device_hotplug: Domain 623: XENBUS_PATH: backend/vif/623/0
2021-02-03 14:36:44.211+0000: libxl: libxl_device.c:1203:device_hotplug: Domain 623: XENBUS_BASE_PATH: backend
2021-02-03 14:36:44.211+0000: libxl: libxl_device.c:1203:device_hotplug: Domain 623: netdev:
2021-02-03 14:36:44.211+0000: libxl: libxl_device.c:1203:device_hotplug: Domain 623: vif: vif623.0
2021-02-03 14:36:44.211+0000: libxl: libxl_internal.c:75:libxl__suse_domain_get_hotplug_timeout: Domain 623:Got from '' = 0 from /libxl/623/suse/nics-LIBXL_HOTPLUG_TIMEOUT for /local/domain/0/backend/vif/623/0: No such file or directory
2021-02-03 14:36:44.211+0000: libxl: libxl_aoutils.c:599:libxl__async_exec_start: forking to execute: /etc/xen/scripts/vif-bridge online for /local/domain/0/backend/vif/623/0
2021-02-03 14:36:44.491+0000: libxl: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x7f61e0014f30: deregister unregistered
2021-02-03 14:36:44.492+0000: libxl: libxl_device.c:1172:device_hotplug: Domain 623:No hotplug script to execute
2021-02-03 14:36:44.492+0000: libxl: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x7f61e0014f30: deregister unregistered
2021-02-03 14:36:44.492+0000: libxl: libxl_event.c:2228:libxl__ao_progress_report: ao 0x7f61e000e2c0: progress report: callback queued aop=0x7f61e005a1d0
2021-02-03 14:36:44.494+0000: libxl: libxl_event.c:1897:libxl__ao_complete: ao 0x7f61e000e2c0: complete, rc=0
2021-02-03 14:36:44.494+0000: libxl: libxl_event.c:1432:egc_run_callbacks: ao 0x7f61e000e2c0: progress report: callback aop=0x7f61e005a1d0
2021-02-03 14:36:44.494+0000: libxl: libxl_event.c:1866:libxl__ao__destroy: ao 0x7f61e000e2c0: destroy
2021-02-03 14:36:44.501+0000: libxl: libxl_event.c:689:libxl__ev_xswatch_deregister: watch w=0x7f61e000ebc8: deregister unregistered
2021-02-03 14:36:44.501+0000: xc: SUSEINFO: domid 623: xc_domain_unpause returned 0
2021-02-03 14:36:44.501+0000: libxl: libxl_event.c:1897:libxl__ao_complete: ao 0x7f61e000ef90: complete, rc=0
2021-02-03 14:36:44.501+0000: libxl: libxl_event.c:1866:libxl__ao__destroy: ao 0x7f61e000ef90: destroy
2021-02-03T15:38:33.050859+01:00 openqaw5-xen libvirtd[32011]: 2021-02-03 14:38:33.029+0000: 32061: debug : virFileClose:110 : Closed fd 45
2021-02-03T15:38:33.051368+01:00 openqaw5-xen libvirtd[32011]: 2021-02-03 14:38:33.029+0000: 32061: debug : virFileClose:110 : Closed fd 47
2021-02-03T15:39:23.752533+01:00 openqaw5-xen kernel: [438217.127392] vif vif-623-0 vif623.0: Guest Rx stalled
2021-02-03T15:39:23.752568+01:00 openqaw5-xen kernel: [438217.127757] br0: port 5(vif623.0) entered disabled state
624: vif623.0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq master br0 state DOWN group default qlen 32
link/ether fe:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
inet6 fe80::fcff:ffff:feff:ffff/64 scope link
valid_lft forever preferred_lft forever
Updated by nanzhang almost 4 years ago
Job failed on build 150.1(Snapshot10), looks like the worker is not reachable again.
sle-15-SP3-Online-x86_64-Build150.1-default_install_svirt@svirt-hyperv2012r2-uefi (https://openqa.nue.suse.com/tests/5476478)
Error connecting to VNC server openqaw5-xen-1.qa.suse.de:5905: IO::Socket::INET: connect: Connection timed out
The host can't be reached as well.
# ssh root@openqaw5-xen.qa.suse.de
ssh: connect to host openqaw5-xen.qa.suse.de port 22: Connection timed out
Updated by xlai almost 4 years ago
mloviska wrote:
https://openqa.suse.de/tests/5399205/file/serial0.txt
After triggering kernel crash, it seems like the brigde settings in xen aren't restored.
Locally I can see error msg:error: Disconnected from xen:///system due to end of file
@mloviska @nanzhang
Seems same root cause with https://bugzilla.suse.com/show_bug.cgi?id=1181989 openQA job causes libvirtd to dump core when running kdump inside domain, which is P1 now and fix wip.
Updated by mloviska almost 4 years ago
If we want to reboot a xen domain we have to remove /etc/udev/rules.d/70-persistent-net.rules
. Frankly, it is quite surprising to me that this file appears on xen domU after the xen host upgrade.
Could it be possibly a side effect of replacing linux bridge
by openvswitch
?
Also network configuration on libvirt
level has to contain a reference that domU
uses openvswitch
. I will push the code change tomorrow morning (has to be done in bootloader_svirt
).
<interface type='bridge'>
<mac address='00:16:3e:09:6f:df'/>
<source bridge='br0'/>
<virtualport type='openvswitch'>
<parameters interfaceid='bf0a6496-a421-41f1-926e-a593f96ce1bb'/>
</virtualport>
<target dev='vif19.0'/>
<model type='netfront'/>
</interface>
Updated by mloviska over 3 years ago
- Tags set to qac
- Status changed from In Progress to Resolved
Except of https://bugzilla.suse.com/show_bug.cgi?id=1181989 there should be no more leftovers. Feel free to reopen if anything shows up. Thanks for your patience!