Project

General

Profile

Actions

action #12566

closed

The Worker Dies When the Job is Cancelled from GUI

Added by omaric almost 8 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
2016-06-30
Due date:
% Done:

0%

Estimated time:

Description

Sequence of Steps

  • The job that had been scheduled to run has appeared in GUI under "x jobs are running" section.
  • While it was still running, I cancelled it by using X (next to the Test name).

The Expected Outcome:

  • The worker should keep running and pick up the next scheduled jobs.

The Actual Outcome/Issue:

  • Cancelling the job had killed the worker.

Systemctl output:
systemctl status openqa-worker*
openqa-worker@1.service - openQA Worker #1
Loaded: loaded (/usr/lib/systemd/system/openqa-worker@.service; enabled)
Active: failed (Result: exit-code) since Thu 2016-06-30 11:20:17 CEST; 27s ago
Process: 12013 ExecStart=/usr/share/openqa/script/worker --instance %i (code=exited, status=1/FAILURE)
Process: 12010 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/%i (code=exited, status=0/SUCCESS)
Main PID: 12013 (code=exited, status=1/FAILURE)

The output from _openqa_worker:
job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
got job 70: 00000070-sle-4.0-x86_64-Build001-suma3@12_SP1_gnomeOnly
12052: WORKING 70
killing 12052
setting job 70 to incomplete (cancel)
can't open /var/lib/openqa/pool/1/testresults/result-consoletest_setup.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 529.
cleaning up 00000070-sle-4.0-x86_64-Build001-suma3@12_SP1_gnomeOnly...
QEMU (12060 -> /usr/bin/qemu-system-x86_64) should be dead - WASUP?


Related issues 1 (0 open1 closed)

Related to openQA Project - action #12178: worker can hang when killing isotovideoResolvedcoolo2016-05-31

Actions
Actions #1

Updated by okurz almost 8 years ago

  • Priority changed from Normal to High

also reported as https://github.com/os-autoinst/os-autoinst/issues/530 confirmed by AdamWill

Actions #2

Updated by AdamWill almost 8 years ago

And @coolo noted that it's fixed in his big follow-up PR:

https://github.com/os-autoinst/os-autoinst/pull/524

by the final commit in that series:

https://github.com/os-autoinst/os-autoinst/pull/524/commits/4d9a31fd72083fdbd84f7ea7d4c9ae6fb986e8b0

to fix this on current master, all you have to do is the isotovideo change, the other two changes are only relevant in the context of the rest of #524. So I'm just using this patch for the Fedora packages for now:

http://pkgs.fedoraproject.org/cgit/rpms/os-autoinst.git/tree/0001-Stop-the-vm-on-signal-handler-i.e.-during-worker-can.patch

Actions #3

Updated by coolo almost 8 years ago

this is really getting silly ;(

Jul 11 20:02:11 openqaworker3 worker[18722]: killing 2201

18:02:11.5242 2201 send_json {"cmd":"check_asserted_screen"}
18:02:11.5245 2206 sysread {"cmd":"check_asserted_screen"}
18:02:11.6105 2206 MATCH(reboot-bios-seabios-20160420:0.00)
18:02:11.7457 2201 signalhandler got TERM - loop 1
18:02:11.7458 2201 can_read received kill signal
18:02:11.7517 2206 MATCH(installation-autoyast-error-20160502:0.50)
18:02:11.8687 2206 MATCH(autoyast-error-20151217:0.49)
18:02:11.9920 2206 MATCH(installation-autoyast-error-20160505:0.50)
18:02:12.1076 2206 MATCH(autoyast-system-login-console-20150529:0.00)
18:02:12.2194 2206 MATCH(gdm-workaround-bsc962806-20160125:0.00)
18:02:12.3159 2206 MATCH(autoyast-system-login-console-minimal-20160113:0.00)
18:02:12.3160 2206 WARNING: check_asserted_screen took 0.79 seconds - make your needles more specific
18:02:12.3199 2206 no match 10
18:21:50.8982 2206 WARNING: enqueue_screenshot took 0.77 seconds - slow IO? (opencv: 0.01 - encoder: 0.76)
18:27:23.1653 2206 WARNING: enqueue_screenshot took 0.50 seconds - slow IO? (opencv: 0.01 - encoder: 0.50)
18:29:24.3795 2206 WARNING: enqueue_screenshot took 0.51 seconds - slow IO? (opencv: 0.01 - encoder: 0.50)
18:31:10.3196 2206 WARNING: enqueue_screenshot took 0.54 seconds - slow IO? (opencv: 0.01 - encoder: 0.53)
18:36:14.6974 2206 WARNING: enqueue_screenshot took 0.82 seconds - slow IO? (opencv: 0.01 - encoder: 0.80)

It just doesn't care ;(

Actions #5

Updated by okurz over 7 years ago

  • Related to action #12178: worker can hang when killing isotovideo added
Actions #6

Updated by coolo over 7 years ago

  • Status changed from New to Resolved

This was fixed earlier, but caused other faults - but they are hopefully fixed now too

Actions

Also available in: Atom PDF