action #40415
closedConcurrent jobs with dependencies don't work if they are on different machines.
Description
Reproducibility:¶
We have 2 jobs, let's say PARENT and CHILD, where CHILD has PARALLEL_WITH=PARENT.
I have created 2 tests that are on the same machine "64bit", qemu with some other options:
http://fromm.arch.suse.de/tests/1394
http://fromm.arch.suse.de/tests/1395
The parent job needs the child ID for the mutex command:
my $children = get_children();
my $child_id = (keys %$children)[0];
...
script_run("echo Waiting for child with child_id=$child_id");
mutex_wait("child_ready", $child_id);
This is one line of the parent's output:
Waiting for child with child_id=1399
Everything OK so far. CHILD recognizes PARENT as its parent and locking api works without problems.
Then, I have created another machine "64bit-other" with the exact same characteristics as the other one. http://fromm.arch.suse.de/admin/machines
And assign CHILD to "64bit-other" in the job group.
The result is that CHILD doesn't have the parent job in the settings panel any more, and the PARENT's output is now:
Waiting for child with child_id=
Therefore, the command
mutex_wait("child_ready", $child_id);
waits forever.
Why having different machines? Well, for virtual jobs it doesn't make sense, but for BareMetal jobs like NFV and InfiniBand tests we are using different workers and machines:
ipmi-sonic and ipmi-tails with different worker classes: 64bit-mlx_con5_sonic and 64bit-mlx_con5_tails respectively.