Actions
action #47087
closed[scheduling] Workers on openqaworker2 stuck frequently
Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2019-02-04
Due date:
% Done:
0%
Estimated time:
Description
Workers of following classes on openqaworker2
host stuck every couple of days: virt-mm-64bit-ipmi
, svirt-hyperv
, and svirt-hyperv2012r2
. I had to restart them manually, so they are untangled and accept jobs again.
On surface, from openQA dashboard, the affected worker has a job from SLES15 SP1 build 157.1 "running" for 2-3 days. Canceling the job didn't work, new job wasn't acquired from the pool. Worker service restart did the job for the time being, but the worker stuck again after 3 days.
This is one such a worker:
mnowak@openqaworker2:~> sudo systemctl status openqa-worker@19
● openqa-worker@19.service - openQA Worker #19
Loaded: loaded (/usr/lib/systemd/system/openqa-worker@.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2019-02-01 13:54:32 CET; 2 days ago
Main PID: 9887 (worker)
Tasks: 1 (limit: 512)
CGroup: /openqa.slice/openqa-worker.slice/openqa-worker@19.service
└─9887 /usr/bin/perl /usr/share/openqa/script/worker --instance 19
Feb 01 19:52:21 openqaworker2 worker[9887]: [info] uploading vars.json
Feb 01 19:52:21 openqaworker2 worker[9887]: [info] uploading serial0.txt
Feb 01 19:52:21 openqaworker2 worker[9887]: [info] uploading autoinst-log.txt
Feb 01 19:52:21 openqaworker2 worker[9887]: [info] uploading worker-log.txt
Feb 01 19:52:21 openqaworker2 worker[9887]: [info] cleaning up 02430240-sle-15-SP1-Installer-DVD-x86_64-Build158.4-skip_registration@svirt-hyperv-uefi
Feb 01 19:53:44 openqaworker2 worker[9887]: GLOB(0x8005aa8)[info] got job 2426719: 02426719-sle-15-SP1-Installer-DVD-x86_64-Build157.1-mediacheck@svirt-hyperv
Feb 01 19:53:44 openqaworker2 worker[9887]: [info] +++ setup notes +++
Feb 01 19:53:44 openqaworker2 worker[9887]: [info] start time: 2019-02-01 18:53:44
Feb 01 19:53:44 openqaworker2 worker[9887]: [info] running on openqaworker2:19 (Linux 4.7.5-2.g02c4d35-default #1 SMP PREEMPT Mon Sep 26 08:11:45 UTC 2016 (02c4d35) x86_64)
Feb 02 18:09:05 openqaworker2 systemd[1]: openqa-worker@19.service: Got notification message from PID 11599, but reception is disabled.
Actions