Project

General

Profile

Actions

action #166025

closed

qemu process hanging indefinitely blocking the corresponding VNC port size:S

Added by jbaier_cz 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2024-08-30
Due date:
% Done:

0%

Estimated time:

Description

Observation

A user cancelled job https://openqa.suse.de/tests/15291067 (probably) led to a qemu-system-x86_64 process hanging indefinitely blocking the corresponding VNC port. The worker for that slot is happily accepting further jobs leading to incompletes with qemu-system-x86_64: -vnc :111,share=force-shared: Failed to find an available port: Address already in use

The autoinst.log of the cancelled job shows the following which might indicate the problem.

[2024-08-29T09:15:20.806766+02:00] [info] [pid:123148] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
  myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 43.
[2024-08-29T09:15:20.809153+02:00] [debug] [pid:123148] QEMU: QEMU emulator version 7.1.0 (SUSE Linux Enterprise 15)
[2024-08-29T09:15:20.809288+02:00] [debug] [pid:123148] QEMU: Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
[2024-08-29T09:15:20.810632+02:00] [debug] [pid:123148] sending magic and exit
[2024-08-29T09:15:20.811010+02:00] [warn] [pid:123148] !!! backend::baseclass::run_capture_loop: capture loop failed myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 43.

[2024-08-29T09:15:20.811133+02:00] [debug] [pid:123148] sending magic and exit
[2024-08-29T09:15:20.811277+02:00] [info] [pid:123148] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
[2024-08-29T09:15:20.811380+02:00] [debug] [pid:123148] sending magic and exit
Use of uninitialized value $_[2] in substr at /usr/lib/perl5/5.26.1/x86_64-linux-thread-multi/IO/Handle.pm line 475.
        IO::Handle::write(IO::Pipe::End=GLOB(0x5561a3580470), undef, undef) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 338
        Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70), CODE(0x5561a344a6f8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
        Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70)) called at /usr/lib/os-autoinst/backend/driver.pm line 72
        backend::driver::start(backend::driver=HASH(0x5561a4af0658)) called at /usr/lib/os-autoinst/backend/driver.pm line 37
        backend::driver::new("backend::driver", "qemu") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
        OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 109
        OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 251
        OpenQA::Isotovideo::Runner::init(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/bin/isotovideo line 182
        eval {...} called at /usr/bin/isotovideo line 177
Use of uninitialized value $_[1] in substr at /usr/lib/perl5/5.26.1/x86_64-linux-thread-multi/IO/Handle.pm line 475.
        IO::Handle::write(IO::Pipe::End=GLOB(0x5561a3580470), undef, undef) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 338
        Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70), CODE(0x5561a344a6f8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
        Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70)) called at /usr/lib/os-autoinst/backend/driver.pm line 72
        backend::driver::start(backend::driver=HASH(0x5561a4af0658)) called at /usr/lib/os-autoinst/backend/driver.pm line 37
        backend::driver::new("backend::driver", "qemu") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
        OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 109
        OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 251
        OpenQA::Isotovideo::Runner::init(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/bin/isotovideo line 182
        eval {...} called at /usr/bin/isotovideo line 177

The issue was identified (https://suse.slack.com/archives/C02CANHLANP/p1724984987182769) and after killing the offending process

rfan@sapworker3:~> ps -ef|grep vnc |grep 111
_openqa+ 123419  96634 99 Aug29 ?        19:06:08 /usr/bin/qemu-system-x86_64 -device VGA,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -cpu qemu64 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot once=d -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :111,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console_user,path=virtio_console_user,logfile=virtio_console_user.log,logappend=on -device virtconsole,chardev=virtio_console_user,name=org.openqa.console.virtio_console_user -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -device qemu-xhci,id=xhci0 -blockdev driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/21/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=usbstick-overlay0-file,filename=/var/lib/openqa/pool/21/raid/usbstick-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=usbstick-overlay0,file=usbstick-overlay0-file,cache.no-flush=on,discard=unmap -device usb-storage,id=usbstick-device,drive=usbstick-overlay0,bootindex=0

the worker looks ok again.

Acceptance criteria

  • AC1: The pool directory cleanup does not remove the QEMU pid file if QEMU is still running (so our detection for QEMU still running actually works).

Suggestions

  • Apparently the qemu pid was deleted along with the pool directory
    • Avoid deleting the pid file unless the process is stopped to our existing check for running QEMU processes actually works (and the worker appears broken as long as QEMU is still running)
  • Maybe qemu got stuck because the pool was deleted while it was running?
Actions

Also available in: Atom PDF