action #12566
closedThe Worker Dies When the Job is Cancelled from GUI
0%
Description
Sequence of Steps
- The job that had been scheduled to run has appeared in GUI under "x jobs are running" section.
- While it was still running, I cancelled it by using X (next to the Test name).
The Expected Outcome:
- The worker should keep running and pick up the next scheduled jobs.
The Actual Outcome/Issue:
- Cancelling the job had killed the worker.
Systemctl output:
systemctl status openqa-worker*
openqa-worker@1.service - openQA Worker #1
Loaded: loaded (/usr/lib/systemd/system/openqa-worker@.service; enabled)
Active: failed (Result: exit-code) since Thu 2016-06-30 11:20:17 CEST; 27s ago
Process: 12013 ExecStart=/usr/share/openqa/script/worker --instance %i (code=exited, status=1/FAILURE)
Process: 12010 ExecStartPre=/usr/bin/install -d -m 0755 -o _openqa-worker /var/lib/openqa/pool/%i (code=exited, status=0/SUCCESS)
Main PID: 12013 (code=exited, status=1/FAILURE)
The output from _openqa_worker:
job-working-directory: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
got job 70: 00000070-sle-4.0-x86_64-Build001-suma3@12_SP1_gnomeOnly
12052: WORKING 70
killing 12052
setting job 70 to incomplete (cancel)
can't open /var/lib/openqa/pool/1/testresults/result-consoletest_setup.json: No such file or directory at /usr/share/openqa/script/../lib/OpenQA/Worker/Jobs.pm line 529.
cleaning up 00000070-sle-4.0-x86_64-Build001-suma3@12_SP1_gnomeOnly...
QEMU (12060 -> /usr/bin/qemu-system-x86_64) should be dead - WASUP?
Updated by okurz over 8 years ago
- Priority changed from Normal to High
also reported as https://github.com/os-autoinst/os-autoinst/issues/530 confirmed by AdamWill
Updated by AdamWill over 8 years ago
And @coolo noted that it's fixed in his big follow-up PR:
https://github.com/os-autoinst/os-autoinst/pull/524
by the final commit in that series:
https://github.com/os-autoinst/os-autoinst/pull/524/commits/4d9a31fd72083fdbd84f7ea7d4c9ae6fb986e8b0
to fix this on current master, all you have to do is the isotovideo change, the other two changes are only relevant in the context of the rest of #524. So I'm just using this patch for the Fedora packages for now:
Updated by coolo over 8 years ago
this is really getting silly ;(
Jul 11 20:02:11 openqaworker3 worker[18722]: killing 2201
18:02:11.5242 2201 send_json {"cmd":"check_asserted_screen"}
18:02:11.5245 2206 sysread {"cmd":"check_asserted_screen"}
18:02:11.6105 2206 MATCH(reboot-bios-seabios-20160420:0.00)
18:02:11.7457 2201 signalhandler got TERM - loop 1
18:02:11.7458 2201 can_read received kill signal
18:02:11.7517 2206 MATCH(installation-autoyast-error-20160502:0.50)
18:02:11.8687 2206 MATCH(autoyast-error-20151217:0.49)
18:02:11.9920 2206 MATCH(installation-autoyast-error-20160505:0.50)
18:02:12.1076 2206 MATCH(autoyast-system-login-console-20150529:0.00)
18:02:12.2194 2206 MATCH(gdm-workaround-bsc962806-20160125:0.00)
18:02:12.3159 2206 MATCH(autoyast-system-login-console-minimal-20160113:0.00)
18:02:12.3160 2206 WARNING: check_asserted_screen took 0.79 seconds - make your needles more specific
18:02:12.3199 2206 no match 10
18:21:50.8982 2206 WARNING: enqueue_screenshot took 0.77 seconds - slow IO? (opencv: 0.01 - encoder: 0.76)
18:27:23.1653 2206 WARNING: enqueue_screenshot took 0.50 seconds - slow IO? (opencv: 0.01 - encoder: 0.50)
18:29:24.3795 2206 WARNING: enqueue_screenshot took 0.51 seconds - slow IO? (opencv: 0.01 - encoder: 0.50)
18:31:10.3196 2206 WARNING: enqueue_screenshot took 0.54 seconds - slow IO? (opencv: 0.01 - encoder: 0.53)
18:36:14.6974 2206 WARNING: enqueue_screenshot took 0.82 seconds - slow IO? (opencv: 0.01 - encoder: 0.80)
It just doesn't care ;(
Updated by coolo over 8 years ago
https://github.com/os-autoinst/os-autoinst/pull/541 for the case in #3
Updated by okurz over 8 years ago
- Related to action #12178: worker can hang when killing isotovideo added
Updated by coolo over 8 years ago
- Status changed from New to Resolved
This was fixed earlier, but caused other faults - but they are hopefully fixed now too