Actions
action #166025
closedqemu process hanging indefinitely blocking the corresponding VNC port size:S
Start date:
2024-08-30
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
A user cancelled job https://openqa.suse.de/tests/15291067 (probably) led to a qemu-system-x86_64 process hanging indefinitely blocking the corresponding VNC port. The worker for that slot is happily accepting further jobs leading to incompletes with qemu-system-x86_64: -vnc :111,share=force-shared: Failed to find an available port: Address already in use
The autoinst.log of the cancelled job shows the following which might indicate the problem.
[2024-08-29T09:15:20.806766+02:00] [info] [pid:123148] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 43.
[2024-08-29T09:15:20.809153+02:00] [debug] [pid:123148] QEMU: QEMU emulator version 7.1.0 (SUSE Linux Enterprise 15)
[2024-08-29T09:15:20.809288+02:00] [debug] [pid:123148] QEMU: Copyright (c) 2003-2022 Fabrice Bellard and the QEMU Project developers
[2024-08-29T09:15:20.810632+02:00] [debug] [pid:123148] sending magic and exit
[2024-08-29T09:15:20.811010+02:00] [warn] [pid:123148] !!! backend::baseclass::run_capture_loop: capture loop failed myjsonrpc: remote end terminated connection, stopping at /usr/lib/os-autoinst/myjsonrpc.pm line 43.
[2024-08-29T09:15:20.811133+02:00] [debug] [pid:123148] sending magic and exit
[2024-08-29T09:15:20.811277+02:00] [info] [pid:123148] ::: backend::baseclass::die_handler: Backend process died, backend errors are reported below in the following lines:
[2024-08-29T09:15:20.811380+02:00] [debug] [pid:123148] sending magic and exit
Use of uninitialized value $_[2] in substr at /usr/lib/perl5/5.26.1/x86_64-linux-thread-multi/IO/Handle.pm line 475.
IO::Handle::write(IO::Pipe::End=GLOB(0x5561a3580470), undef, undef) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 338
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70), CODE(0x5561a344a6f8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70)) called at /usr/lib/os-autoinst/backend/driver.pm line 72
backend::driver::start(backend::driver=HASH(0x5561a4af0658)) called at /usr/lib/os-autoinst/backend/driver.pm line 37
backend::driver::new("backend::driver", "qemu") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 109
OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 251
OpenQA::Isotovideo::Runner::init(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/bin/isotovideo line 182
eval {...} called at /usr/bin/isotovideo line 177
Use of uninitialized value $_[1] in substr at /usr/lib/perl5/5.26.1/x86_64-linux-thread-multi/IO/Handle.pm line 475.
IO::Handle::write(IO::Pipe::End=GLOB(0x5561a3580470), undef, undef) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 338
Mojo::IOLoop::ReadWriteProcess::_fork(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70), CODE(0x5561a344a6f8)) called at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 492
Mojo::IOLoop::ReadWriteProcess::start(Mojo::IOLoop::ReadWriteProcess=HASH(0x5561a3578f70)) called at /usr/lib/os-autoinst/backend/driver.pm line 72
backend::driver::start(backend::driver=HASH(0x5561a4af0658)) called at /usr/lib/os-autoinst/backend/driver.pm line 37
backend::driver::new("backend::driver", "qemu") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Backend.pm line 14
OpenQA::Isotovideo::Backend::new("OpenQA::Isotovideo::Backend") called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 109
OpenQA::Isotovideo::Runner::create_backend(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/lib/os-autoinst/OpenQA/Isotovideo/Runner.pm line 251
OpenQA::Isotovideo::Runner::init(OpenQA::Isotovideo::Runner=HASH(0x55619c4c7bf8)) called at /usr/bin/isotovideo line 182
eval {...} called at /usr/bin/isotovideo line 177
The issue was identified (https://suse.slack.com/archives/C02CANHLANP/p1724984987182769) and after killing the offending process
rfan@sapworker3:~> ps -ef|grep vnc |grep 111
_openqa+ 123419 96634 99 Aug29 ? 19:06:08 /usr/bin/qemu-system-x86_64 -device VGA,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -audiodev none,id=snd0 -device intel-hda -device hda-output,audiodev=snd0 -global isa-fdc.fdtypeA=none -m 2048 -cpu qemu64 -netdev user,id=qanet0 -device virtio-net,netdev=qanet0,mac=52:54:00:12:34:56 -object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0 -boot once=d -device qemu-xhci -device usb-tablet -smp 1 -enable-kvm -no-shutdown -vnc :111,share=force-shared -device virtio-serial -chardev pipe,id=virtio_console,path=virtio_console,logfile=virtio_console.log,logappend=on -device virtconsole,chardev=virtio_console,name=org.openqa.console.virtio_console -chardev pipe,id=virtio_console_user,path=virtio_console_user,logfile=virtio_console_user.log,logappend=on -device virtconsole,chardev=virtio_console_user,name=org.openqa.console.virtio_console_user -chardev socket,path=qmp_socket,server=on,wait=off,id=qmp_socket,logfile=qmp_socket.log,logappend=on -qmp chardev:qmp_socket -S -device virtio-scsi-pci,id=scsi0 -device qemu-xhci,id=xhci0 -blockdev driver=file,node-name=hd0-file,filename=/var/lib/openqa/pool/21/raid/hd0,cache.no-flush=on -blockdev driver=qcow2,node-name=hd0,file=hd0-file,cache.no-flush=on,discard=unmap -device virtio-blk,id=hd0-device,drive=hd0,serial=hd0 -blockdev driver=file,node-name=usbstick-overlay0-file,filename=/var/lib/openqa/pool/21/raid/usbstick-overlay0,cache.no-flush=on -blockdev driver=qcow2,node-name=usbstick-overlay0,file=usbstick-overlay0-file,cache.no-flush=on,discard=unmap -device usb-storage,id=usbstick-device,drive=usbstick-overlay0,bootindex=0
the worker looks ok again.
Acceptance criteria¶
- AC1: The pool directory cleanup does not remove the QEMU pid file if QEMU is still running (so our detection for QEMU still running actually works).
Suggestions¶
- Apparently the qemu pid was deleted along with the pool directory
- Avoid deleting the pid file unless the process is stopped to our existing check for running QEMU processes actually works (and the worker appears broken as long as QEMU is still running)
- Maybe qemu got stuck because the pool was deleted while it was running?
Actions