Project

General

Profile

Actions

action #138464

closed

[qe-tools] openqa-worker stopped to work or wait for : A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 114.58 s

Added by zluo about 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Support
Target version:
Start date:
2023-10-25
Due date:
% Done:

0%

Estimated time:

Description

I have incomplete test run: http://10.168.192.143/tests/197

It seems that openqa-worker stucks forever. This means that no job can be executed later, even after 10 hours passed. The worker is still trying to check from web UI.


● openqa-worker-plain@1.service - openQA Worker #1
     Loaded: loaded (/usr/lib/systemd/system/openqa-worker-plain@.service; enabled; preset: disabled)
     Active: active (running) since Tue 2023-10-24 20:46:54 CEST; 10h ago
    Process: 3091 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/1 (code=exited, status=0/SUCCESS)
   Main PID: 3098 (worker)
      Tasks: 10 (limit: 4915)
        CPU: 27min 54.521s
     CGroup: /openqa.slice/openqa-worker.slice/openqa-worker-plain@1.service
             ├─3098 /usr/bin/perl /usr/share/openqa/script/worker --instance 1
             └─3911 /usr/bin/qemu-system-x86_64 -device VGA,edid=on,xres=1024,yres=768 -only-migratable -chardev ringbuf,id=serial0,logfile=serial0,logappend=on -serial 
Okt 25 07:12:45 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 114.58 s
Okt 25 07:14:40 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 142.03 s
Okt 25 07:17:02 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 206.08 s
Okt 25 07:20:28 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 228.03 s
Okt 25 07:24:17 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 285.92 s
Okt 25 07:29:03 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 152.51 s
Okt 25 07:31:35 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 124.26 s
Okt 25 07:33:40 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 200.26 s
Okt 25 07:37:00 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 296.79 s
Okt 25 07:41:57 quake1 worker[3098]: [warn] A QEMU instance using the current pool directory is still running (PID: 3911) - checking again for web UI 'localhost' in 217.92 s

This issue has impact on https://progress.opensuse.org/issues/138200

Actions #1

Updated by okurz about 1 year ago

  • Category set to Support
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version set to Ready

This issue has impact on https://progress.opensuse.org/issues/138200

Please link other tickets on progress.opensuse.org with #<id>, e.g. #138200

If there is another qemu process blocking the worker then that process with PID 3911 was either started outside the control of the openQA worker or another conflicting openQA worker instance. Please provide the output of ps auxf | grep -C 3 3911 assuming that the conflicting qemu process with PID 3911 is still running. Also please provide the output of rpm -q openQA-worker. Alternatively provide me access to the system and I will check myself.

Actions #2

Updated by zluo about 1 year ago

  • Status changed from In Progress to Resolved

thanks @okurz.

my server got frozen somehow and I need to reboot it. I could not reproduce this issue at moment. Anyway I run zypper update and at moment openqa-worker seems to work stable.

Actions

Also available in: Atom PDF