action #138464
closed[qe-tools] openqa-worker stopped to work or wait for : A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 114.58 s
Description
I have incomplete test run: http://10.168.192.143/tests/197
It seems that openqa-worker stucks forever. This means that no job can be executed later, even after 10 hours passed. The worker is still trying to check from web UI.
● openqa-worker-plain@1.service - openQA Worker #1
Loaded: loaded (/usr/lib/systemd/system/openqa-worker-plain@.service; enabled; preset: disabled)
Active: active (running) since Tue 2023-10-24 20:46:54 CEST; 10h ago
Process: 3091 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
Main PID: 3098 (worker)
Tasks: 10 (limit: 4915)
CPU: 27min 54.521s
CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-plain@1.service
├─3098 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
└─3911 /usr/bin/qemu-system-x86_64 -device VGA,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial
Okt 25 07:12:45 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 114.58 s
Okt 25 07:14:40 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 142.03 s
Okt 25 07:17:02 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 206.08 s
Okt 25 07:20:28 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 228.03 s
Okt 25 07:24:17 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 285.92 s
Okt 25 07:29:03 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 152.51 s
Okt 25 07:31:35 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 124.26 s
Okt 25 07:33:40 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 200.26 s
Okt 25 07:37:00 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 296.79 s
Okt 25 07:41:57 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 217.92 s
This issue has impact on https://progress.opensuse.org/issues/138200
Updated by okurz about 1 year ago
- Category set to Support
- Status changed from New to In Progress
- Assignee set to okurz
- Target version set to Ready
This issue has impact on https://progress.opensuse.org/issues/138200
Please link other tickets on progress.opensuse.org with #<id>
, e.g. #138200
If there is another qemu process blocking the worker then that process with PID 3911 was either started outside the control of the openQA worker or another conflicting openQA worker instance. Please provide the output of ps auxf | grep -C 3 3911
assuming that the conflicting qemu process with PID 3911 is still running. Also please provide the output of rpm -q openQA-worker
. Alternatively provide me access to the system and I will check myself.
Updated by zluo about 1 year ago
- Status changed from In Progress to Resolved
thanks @okurz.
my server got frozen somehow and I need to reboot it. I could not reproduce this issue at moment. Anyway I run zypper update and at moment openqa-worker seems to work stable.