Project

General

Profile

action #99153

Updated by okurz about 3 years ago

## Observation 
 There are many incomplete jobs on OSD, please see: https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&from=1632278812298&to=1632451612298&viewPanel=17 

 ``` 
  7211384 | offline_sles15sp1_ltss_media_basesys-srv-desk-dev-contm-lgm-py2-wsm_all_full                 | aarch64           | 2021-09-24 00:07:46 | incomplete | b 
 ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7211514 | install_ltp+sle+Server-DVD-Incidents-Kernel-KOTD                                             | s390x-kvm-sle12 | 2021-09-24 00:14:42 | incomplete | b 
 ackend died: Lost SSH connection to SUT: Failure while draining incoming flow at /usr/lib/os-autoinst/consoles/ssh_screen.pm line 89. 
  7211282 | online_sles15sp2_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full                    | aarch64           | 2021-09-24 00:30:46 | incomplete | b 
 ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7211232 | online_sles15sp1_ltss_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full               | aarch64           | 2021-09-24 00:31:00 | incomplete | b 
 ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7212147 | offline_sles12sp5_pscc_sdk-tcm-wsm_all_full:investigate:retry                                | aarch64           | 2021-09-24 00:32:15 | incomplete | b 
 ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7212048 | offline_sles15sp2_pscc_lp-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full                | aarch64           | 2021-09-24 00:48:34 | incomplete | b 
 ackend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7207535 | qam-yast_self_update+15                                                                      | uefi              | 2021-09-24 01:12:50 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7208023 | mru-install-multipath-remote_supportserver                                                   | 64bit             | 2021-09-24 01:12:51 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7208045 | qam-textmode+sle15                                                                           | 64bit             | 2021-09-24 01:12:51 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7207737 | create_hdd_minimal_base+sdk+python2                                                          | 64bit             | 2021-09-24 01:12:52 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7208073 | lvm_thin_provisioning                                                                        | 64bit             | 2021-09-24 01:12:52 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7208237 | sle-15-SP3_image_on_sle-12-SP5_host_docker                                                   | 64bit             | 2021-09-24 01:12:52 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7207741 | mru-install-desktop-with-addons                                                              | 64bit             | 2021-09-24 01:12:52 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7208022 | mru-install-minimal-with-addons-multipath                                                    | 64bit             | 2021-09-24 01:12:54 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7208289 | yast_no_self_update                                                                          | 64bit             | 2021-09-24 01:13:00 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7208232 | sle-15-SP3_image_on_sle-15-SP3_host_docker                                                   | 64bit             | 2021-09-24 01:13:00 | incomplete | c 
 ache failure: Cache service queue already full (10) 
  7207758 | qam-gnome                                                                                    | 64bit             | 2021-09-24 01:13:01 | incomplete | c 
 ... ... 
  7213920 | online_sles15sp1_ltss_pscc_base_all_minimal_zypp                                             | 64bit_cirrus      | 2021-09-24 01:41:16 | incomplete | cache failure: Cache service queue already full (10) 
  7208973 | qam_ha_qdevice_node2                                                                         | 64bit             | 2021-09-24 01:41:23 | incomplete | backend died: QEMU exited unexpectedly, see log for details 
  7209390 | qam_3nodes_node01                                                                            | 64bit             | 2021-09-24 01:43:42 | incomplete | backend died: QEMU exited unexpectedly, see log for details 
  7209465 | mau-webserver                                                                                | 64bit             | 2021-09-24 01:43:45 | incomplete | cache failure: Cache service queue already full (10) 
  7213653 | qam-gnome                                                                                    | s390x-kvm-sle12 | 2021-09-24 01:44:31 | incomplete | backend died: Error connecting to VNC server <10.161.145.95:5901>: IO::Socket::INET: connect: Connection timed out 
  7209000 | qam_ha_priority_fencing_node01                                                               | 64bit             | 2021-09-24 01:46:55 | incomplete | backend died: QEMU exited unexpectedly, see log for details 
  7209381 | qam_ha_priority_fencing_node02                                                               | 64bit             | 2021-09-24 01:48:18 | incomplete | cache failure: Cache service queue already full (10) 
  7211405 | offline_sles15sp1_ltss_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full              | aarch64           | 2021-09-24 01:49:28 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7213816 | qam-gnome                                                                                    | s390x-kvm-sle12 | 2021-09-24 01:51:14 | incomplete | backend died: Error connecting to VNC server <10.161.145.80:5901>: IO::Socket::INET: connect: Connection timed out 
  7213652 | qam-minimal+base                                                                             | s390x-kvm-sle12 | 2021-09-24 01:58:28 | incomplete | backend died: Error connecting to VNC server <10.161.145.92:5901>: IO::Socket::INET: connect: Connection timed out 
  7212213 | offline_sles15sp2_pscc_lp-basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full                | aarch64           | 2021-09-24 02:02:46 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7213165 | qam-minimal+base                                                                             | s390x-kvm-sle12 | 2021-09-24 02:07:04 | incomplete | backend died: Error connecting to VNC server <10.161.145.95:5901>: IO::Socket::INET: connect: Connection timed out 
  7213815 | qam-minimal+base                                                                             | s390x-kvm-sle12 | 2021-09-24 02:07:06 | incomplete | backend died: Error connecting to VNC server <10.161.145.96:5901>: IO::Socket::INET: connect: Connection timed out 
  7213167 | mru-install-minimal-with-addons                                                              | s390x-kvm-sle12 | 2021-09-24 02:13:49 | incomplete | backend died: Error connecting to VNC server <10.161.145.91:5901>: IO::Socket::INET: connect: Connection timed out 
  7212153 | online_sles15sp3_pscc_lp-basesys-srv-desk-dev-contm-lgm-tsm-wsm_all_full:investigate:retry | aarch64           | 2021-09-24 02:14:56 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7213137 | qam-gnome                                                                                    | s390x-kvm-sle15 | 2021-09-24 02:48:58 | incomplete | backend died: Error connecting to VNC server <10.161.145.90:5901>: IO::Socket::INET: connect: Connection timed out 
  7212197 | online_sles15sp2_pscc_basesys-srv-desk-dev-contm-lgm-py2-tsm-wsm_all_full                    | aarch64           | 2021-09-24 02:54:19 | incomplete | backend died: Migrate to file failed, it has been running for more than 240 seconds at /usr/lib/os-autoinst/backend/qemu.pm line 266. 
  7213903 | ext4_staging_s390x                                                                           | s390x-kvm-sle12 | 2021-09-24 02:58:03 | incomplete | backend died: Error connecting to VNC server <10.161.145.96:5901>: IO::Socket::INET: connect: Connection timed out 
 ``` 

 Checked some jobs with `backend died: QEMU exited unexpectedly, see log for details`, in these jobs' autoinst-log.txt, show: 
 ``` 
 [2021-09-24T03:41:22.180 CEST] [info] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines: 
   QEMU terminated before QMP connection could be established. Check for errors below 
 [2021-09-24T03:41:22.180 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json 
 [2021-09-24T03:41:22.181 CEST] [debug] Passing remaining frames to the video encoder 
 [2021-09-24T03:41:22.248 CEST] [debug] Waiting for video encoder to finalize the video 
 [2021-09-24T03:41:22.248 CEST] [debug] The built-in video encoder (pid 59450) terminated 
 [2021-09-24T03:41:22.250 CEST] [debug] QEMU: QEMU emulator version 4.2.1 (openSUSE Leap 15.2) 
 [2021-09-24T03:41:22.250 CEST] [debug] QEMU: Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers 
 [2021-09-24T03:41:22.250 CEST] [warn] !!! : qemu-system-x86_64: -blockdev driver=qcow2,node-name=hd0-overlay0,file=hd0-overlay0-file,cache.no-flush=on: Could not open backing file: Image is not in qcow2 format 
 ``` 

 ## Suggestions 
 - Not related to #98901 
 - qemu says `Could not open backing file: Image is not in qcow2 format` 
 - Check what recent changes wrt qemu use could have caused this 
 - Verify if we broke qemu 4.2.1 by supporting 6.0 
 - Consider the relation to #98727 
 - Add automatic restarting for known non-critical issues, assuming this issue is flaky 
 - Ensure that relevant, i.e. "most", scenarios are handled and incomplete jobs (also in the past) are handled, e.g. retriggered and fixed 
 - Unpause alerts 

 ## Rollback steps 
 * Unpause alert and verify that it passes

Back