action #40415
closedConcurrent jobs with dependencies don't work if they are on different machines.
Description
Reproducibility:¶
We have 2 jobs, let's say PARENT and CHILD, where CHILD has PARALLEL_WITH=PARENT.
I have created 2 tests that are on the same machine "64bit", qemu with some other options:
http://fromm.arch.suse.de/tests/1394
http://fromm.arch.suse.de/tests/1395
The parent job needs the child ID for the mutex command:
my $children = get_children();
my $child_id = (keys %$children)[0];
...
script_run("echo Waiting for child with child_id=$child_id");
mutex_wait("child_ready", $child_id);
This is one line of the parent's output:
Waiting for child with child_id=1399
Everything OK so far. CHILD recognizes PARENT as its parent and locking api works without problems.
Then, I have created another machine "64bit-other" with the exact same characteristics as the other one. http://fromm.arch.suse.de/admin/machines
And assign CHILD to "64bit-other" in the job group.
The result is that CHILD doesn't have the parent job in the settings panel any more, and the PARENT's output is now:
Waiting for child with child_id=
Therefore, the command
mutex_wait("child_ready", $child_id);
waits forever.
Why having different machines? Well, for virtual jobs it doesn't make sense, but for BareMetal jobs like NFV and InfiniBand tests we are using different workers and machines:
ipmi-sonic and ipmi-tails with different worker classes: 64bit-mlx_con5_sonic and 64bit-mlx_con5_tails respectively.
Updated by EDiGiacinto over 6 years ago
- Category set to 122
That's a feature that should also consider adapting what was done for https://progress.opensuse.org/issues/25892
Updated by EDiGiacinto over 6 years ago
- Related to action #25892: Scheduling parallel jobs added
Updated by coolo about 6 years ago
The tricky part is finding the limits - e.g. if you schedule in one job group multiple server/client pairs on different hardware/architecture. So we'll need some kind of 'finding nearest partner' and error out if we can't clearly identify it.
Updated by mitiao about 6 years ago
Updated by cfconrad about 6 years ago
What about explicitly define the machine, like:
START_AFTER_TEST=upload_img:64bit
or
PARALLEL_WITH=test1:%MACHINE%-foo,test2:%MACHINE%-baar
Updated by coolo about 6 years ago
this might work for you as you only have one architecture. But in every other scenario it means duplicating test suites because you need to hardcode machine names in test suite settings.
Updated by mitiao about 6 years ago
- Status changed from In Progress to Resolved
Resolved as PR merged.
Updated by okurz about 6 years ago
- Status changed from Resolved to In Progress
Please check http://open.qa/docs/#_inter_machine_dependencies, the documentation seems to be broken on "===== Example", maybe just a missing blank line?
Updated by mitiao about 6 years ago
- Status changed from In Progress to Feedback
okurz wrote:
Please check http://open.qa/docs/#_inter_machine_dependencies, the documentation seems to be broken on "===== Example", maybe just a missing blank line?
Thanks for check, fix in
https://github.com/os-autoinst/openQA/pull/1859
Updated by okurz about 6 years ago
- Status changed from Feedback to Resolved
http://open.qa/docs/#_inter_machine_dependencies looks fine, ty
Updated by mitiao about 6 years ago
Sorry, another fix for the doc
https://github.com/os-autoinst/openQA/pull/1865
Should be fine finally.
Updated by okurz about 6 years ago
- Related to action #42857: [qe-core][functional][s390x] Change structure of s390x KVM hosts on production (o.s.d) added
Updated by coolo about 6 years ago
- Target version changed from Current Sprint to Done