action #99153
Updated by livdywan about 3 years ago
## Observation There are many incomplete jobs on OSD, please see: https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1632278812298&to=1632451612298&viewPanel=17 ``` 7211384 | offline_sles15sp1_ltss_media_basesys-srv-desk-dev-contm-lgm-py2-wsm_all_full | aarch64 | 2021-09-24 00:07:46 | incomplete | b ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7211514 | install_ltp+sle+Server-DVD-Incidents-Kernel-KOTD | s390x-kvm-sle12 | 2021-09-24 00:14:42 | incomplete | b ackend died: Lost SSH connection to SUT: Failure while draining incoming flow at /usr/lib/os-autoinst/consoles/ssh_screen.pm line 89. 7211282 | online_sles15sp2_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full | aarch64 | 2021-09-24 00:30:46 | incomplete | b ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7211232 | online_sles15sp1_ltss_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full | aarch64 | 2021-09-24 00:31:00 | incomplete | b ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7212147 | offline_sles12sp5_pscc_sdk-tcm-wsm_all_full:investigate:retry | aarch64 | 2021-09-24 00:32:15 | incomplete | b ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7212048 | offline_sles15sp2_pscc_lp-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full | aarch64 | 2021-09-24 00:48:34 | incomplete | b ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7207535 | qam-yast_self_update+15 | uefi | 2021-09-24 01:12:50 | incomplete | c ache failure: Cache service queue already full (10) 7208023 | mru-install-multipath-remote_supportserver | 64bit | 2021-09-24 01:12:51 | incomplete | c ache failure: Cache service queue already full (10) 7208045 | qam-textmode+sle15 | 64bit | 2021-09-24 01:12:51 | incomplete | c ache failure: Cache service queue already full (10) 7207737 | create_hdd_minimal_base+sdk+python2 | 64bit | 2021-09-24 01:12:52 | incomplete | c ache failure: Cache service queue already full (10) 7208073 | lvm_thin_provisioning | 64bit | 2021-09-24 01:12:52 | incomplete | c ache failure: Cache service queue already full (10) 7208237 | sle-15-SP3_image_on_sle-12-SP5_host_docker | 64bit | 2021-09-24 01:12:52 | incomplete | c ache failure: Cache service queue already full (10) 7207741 | mru-install-desktop-with-addons | 64bit | 2021-09-24 01:12:52 | incomplete | c ache failure: Cache service queue already full (10) 7208022 | mru-install-minimal-with-addons-multipath | 64bit | 2021-09-24 01:12:54 | incomplete | c ache failure: Cache service queue already full (10) 7208289 | yast_no_self_update | 64bit | 2021-09-24 01:13:00 | incomplete | c ache failure: Cache service queue already full (10) 7208232 | sle-15-SP3_image_on_sle-15-SP3_host_docker | 64bit | 2021-09-24 01:13:00 | incomplete | c ache failure: Cache service queue already full (10) 7207758 | qam-gnome | 64bit | 2021-09-24 01:13:01 | incomplete | c ... ... 7213920 | online_sles15sp1_ltss_pscc_base_all_minimal_zypp | 64bit_cirrus | 2021-09-24 01:41:16 | incomplete | cache failure: Cache service queue already full (10) 7208973 | qam_ha_qdevice_node2 | 64bit | 2021-09-24 01:41:23 | incomplete | backend died: QEMU exited unexpectedly, see log for details 7209390 | qam_3nodes_node01 | 64bit | 2021-09-24 01:43:42 | incomplete | backend died: QEMU exited unexpectedly, see log for details 7209465 | mau-webserver | 64bit | 2021-09-24 01:43:45 | incomplete | cache failure: Cache service queue already full (10) 7213653 | qam-gnome | s390x-kvm-sle12 | 2021-09-24 01:44:31 | incomplete | backend died: Error connecting to VNC server <10.161.145.95:5901>: IO::Socket::INET: connect: Connection timed out 7209000 | qam_ha_priority_fencing_node01 | 64bit | 2021-09-24 01:46:55 | incomplete | backend died: QEMU exited unexpectedly, see log for details 7209381 | qam_ha_priority_fencing_node02 | 64bit | 2021-09-24 01:48:18 | incomplete | cache failure: Cache service queue already full (10) 7211405 | offline_sles15sp1_ltss_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full | aarch64 | 2021-09-24 01:49:28 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7213816 | qam-gnome | s390x-kvm-sle12 | 2021-09-24 01:51:14 | incomplete | backend died: Error connecting to VNC server <10.161.145.80:5901>: IO::Socket::INET: connect: Connection timed out 7213652 | qam-minimal+base | s390x-kvm-sle12 | 2021-09-24 01:58:28 | incomplete | backend died: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection timed out 7212213 | offline_sles15sp2_pscc_lp-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full | aarch64 | 2021-09-24 02:02:46 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7213165 | qam-minimal+base | s390x-kvm-sle12 | 2021-09-24 02:07:04 | incomplete | backend died: Error connecting to VNC server <10.161.145.95:5901>: IO::Socket::INET: connect: Connection timed out 7213815 | qam-minimal+base | s390x-kvm-sle12 | 2021-09-24 02:07:06 | incomplete | backend died: Error connecting to VNC server <10.161.145.96:5901>: IO::Socket::INET: connect: Connection timed out 7213167 | mru-install-minimal-with-addons | s390x-kvm-sle12 | 2021-09-24 02:13:49 | incomplete | backend died: Error connecting to VNC server <10.161.145.91:5901>: IO::Socket::INET: connect: Connection timed out 7212153 | online_sles15sp3_pscc_lp-basesys-srv-desk-dev-contm-lgm-tsm-wsm_all_full:investigate:retry | aarch64 | 2021-09-24 02:14:56 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7213137 | qam-gnome | s390x-kvm-sle15 | 2021-09-24 02:48:58 | incomplete | backend died: Error connecting to VNC server <10.161.145.90:5901>: IO::Socket::INET: connect: Connection timed out 7212197 | online_sles15sp2_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full | aarch64 | 2021-09-24 02:54:19 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 7213903 | ext4_staging_s390x | s390x-kvm-sle12 | 2021-09-24 02:58:03 | incomplete | backend died: Error connecting to VNC server <10.161.145.96:5901>: IO::Socket::INET: connect: Connection timed out ``` Checked some jobs with `backend died: QEMU exited unexpectedly, see log for details`, in these jobs' autoinst-log.txt, show: ``` [2021-09-24T03:41:22.180 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines: QEMU terminated before QMP connection could be established. Check for errors below [2021-09-24T03:41:22.180 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json [2021-09-24T03:41:22.181 CEST] [debug] Passing remaining frames to the video encoder [2021-09-24T03:41:22.248 CEST] [debug] Waiting for video encoder to finalize the video [2021-09-24T03:41:22.248 CEST] [debug] The built-in video encoder (pid 59450) terminated [2021-09-24T03:41:22.250 CEST] [debug] QEMU: QEMU emulator version 4.2.1 (openSUSE Leap 15.2) [2021-09-24T03:41:22.250 CEST] [debug] QEMU: Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers [2021-09-24T03:41:22.250 CEST] [warn] !!! : qemu-system-x86_64: -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on: Could not open backing file: Image is not in qcow2 format ``` ## Suggestions - Not related to #98901 - qemu says `Could not open backing file: Image is not in qcow2 format` - Check what recent changes wrt qemu use could have caused this - Verify if we broke qemu 4.2.1 by supporting 6.0 - Consider the relation to #98727 - Add automatic restarting for known non-critical issues, assuming this issue is flaky