Project

General

Profile

Actions

action #56993

closed

left-over qemu processes cause incomplete jobs with "Failed to find an available port: Address already in use", previous jobs failed but did not tear down qemu

Added by okurz over 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2019-09-17
Due date:
% Done:

0%

Estimated time:

Description

Observation

https://openqa.suse.de/tests/3370943/file/autoinst-log.txt on openqaworker-arm-2:10 shows

[2019-09-17T17:29:22.251 UTC] [debug] QEMU: Copyright (c) 2003-2018 Fabrice Bellard and the QEMU Project developers
[2019-09-17T17:29:22.251 UTC] [debug] QEMU: qemu-system-aarch64: -vnc :100,share=force-shared: Failed to find an available port: Address already in use

checking on the worker I can find a qemu process for that worker instance still running:

_openqa+ 36620  0.0  0.0 122100 114932 ?       Ss   Sep12   4:14 /usr/bin/perl /usr/share/openqa/script/worker --instance 10
_openqa+ 46505  1.5  1.0 2045052 1332612 ?     SLl  12:45   7:00  \_ /usr/bin/qemu-system-aarch64 -device virtio-gpu-pci -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial chardev:serial0 -sound

Looking in the worker journal I could find https://openqa.suse.de/tests/3368418/ which looks to be the job that ended up incomplete with leaving this qemu instance around.

This was observed already some time ago – since around 2019-08, potentially even earlier – by at least okurz and riafarov and mitigated by manually terminating that qemu instance. So far okurz is only aware of seeing this on openqaworker-arm-2 in particular.

Problem

First, the openQA worker is looking for a qemu pid file in the pool folder but that one is not there even though that qemu never stopped. That should be checked as well as why the qemu process is not stopped.

Workaround

ssh into the worker and terminate the faulty qemu processes manually.


Related issues 1 (0 open1 closed)

Has duplicate openQA Project (public) - action #57188: Can't syswrite(IO::Socket::UNIX=GLOB(0xaaaae3b7e870), <BUFFER>): Broken pipe at /usr/lib/os-autoinst/backend/qemu.pm line 964Rejectedokurz2019-09-23

Actions
Actions

Also available in: Atom PDF