Project

General

Profile

Actions

action #39974

closed

[openqa][PARALLEL_WITH] Child job failure makes parent job terminated.

Added by xlai over 5 years ago. Updated over 5 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
-
Target version:
-
Start date:
2018-08-20
Due date:
% Done:

0%

Estimated time:

Description

I have two jobs with relationship of PARALLEL_WITH, when the child job finished as failed, parent job got TERM soon, and not finished the other codes left, which makes it impossible to upload failure logs on parent job.

Relationship of the two jobs: PARALLEL_WITH

Key code on parent job:
mutex_create('DST_READY_TO_START'); // after this , child starts core test code
wait_for_children;
#upload logs
script_run("xl dmesg > /tmp/xl-dmesg.log"); // got TERM from os-autoinst log, not finished following
my $logs = "/var/log/libvirt /var/log/messages /var/log/xen /var/lib/xen/dump /tmp/xl-dmesg.log";
&virt_autotest_base::upload_virt_logs($logs, "guest-migration-dst-logs");

Key log:
CHILD JOB: http://10.67.18.220/tests/259, normally failed
PARENT: http://10.67.18.220/tests/258/file/autoinst-log.txt
PARENT KEY LOG:
[2018-08-17T18:40:18.0074 CST] [debug] Waiting for 1 jobs to finish
[2018-08-17T18:40:19.0096 CST] [debug] Waiting for 1 jobs to finish
[2018-08-17T18:40:20.0121 CST] [debug] Waiting for 0 jobs to finish
[2018-08-17T18:40:20.0121 CST] [debug] /var/lib/openqa/share/tests/sle-12-SP4/tests/virt_autotest/guest_migration_dst.pm:49 called testapi::script_run
[2018-08-17T18:40:20.0121 CST] [debug] <<< testapi::script_run(cmd='xl dmesg > /tmp/xl-dmesg.log', wait=undef)
[2018-08-17T18:40:20.0121 CST] [debug] /var/lib/openqa/share/tests/sle-12-SP4/tests/virt_autotest/guest_migration_dst.pm:49 called testapi::script_run
[2018-08-17T18:40:20.0122 CST] [debug] <<< testapi::type_string(string='xl dmesg > /tmp/xl-dmesg.log', max_interval=250, wait_screen_changes=0, wait_still_screen=0)
BYTES {"json_cmd_token":"kCadcRoy","type_string":{"max_interval":250,"text":"xl dmesg > /tmp/xl-dmesg.log","json_cmd_token":"EGgaJsza"}}
[2018-08-17T18:40:20.0615 CST] [debug] backend got TERM
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":39339"
after 2679 requests (2522 known processed) with 0 events remaining.
[2018-08-17T18:40:20.0617 CST] [info] Collected unknown process with pid 17212 and exit status: 1
[2018-08-17T18:40:20.0617 CST] [debug] autotest received signal TERM, saving results of current test before exiting
[2018-08-17T18:40:20.0618 CST] [debug] signalhandler got TERM - loop 1
[2018-08-17T18:40:20.0618 CST] [debug] awaiting death of commands process
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":60734"
after 2729 requests (2729 known processed) with 0 events remaining.
[2018-08-17T18:40:20.0624 CST] [debug] tests died
[2018-08-17T18:40:20.0624 CST] [info] Collected unknown process with pid 17309 and exit status: 1
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17311 and exit status: 15
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17313 and exit status: 0
[2018-08-17T18:40:20.0625 CST] [info] Collected unknown process with pid 17314 and exit status: 255
[2018-08-17T18:40:20.0626 CST] [debug] signalhandler got TERM - loop 0
[2018-08-17T18:40:20.0626 CST] [debug] killing backend process 16929
[2018-08-17T18:40:20.0626 CST] [info] Collected unknown process with pid 17214 and exit status: 15
[2018-08-17T18:40:20.0627 CST] [info] Collected unknown process with pid 17216 and exit status: 0
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 16930 and exit status: 0
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 16963 and exit status: 0
[2018-08-17T18:40:20.0970 CST] [info] Collected unknown process with pid 17076 and exit status: 0
[2018-08-17T18:40:20.0971 CST] [info] Collected unknown process with pid 17204 and exit status: 0
[2018-08-17T18:40:20.0971 CST] [info] Collected unknown process with pid 17301 and exit status: 0
[2018-08-17T18:40:20.0972 CST] [info] Collected unknown process with pid 20001 and exit status: 0
[2018-08-17T18:40:20.0975 CST] [debug] done with backend process
[2018-08-17T18:40:20.0982 CST] [info] Isotovideo exit status: 1
[2018-08-17T18:40:20.0983 CST] [info] +++ worker notes +++
[2018-08-17T18:40:20.0983 CST] [info] end time: 2018-08-17 10:40:20
[2018-08-17T18:40:20.0983 CST] [info] result: cancel

Actions

Also available in: Atom PDF