action #110824
openThe live view does not work for directly chained jobs
0%
Description
The live view does not work if starting jobs with START_DIRECTLY_AFTER_TEST. "live view" and "live log" of running job show nothing. The web browser console can get some messages.
Updated by mkittler over 2 years ago
Can you give some example jobs? Maybe it is just because those jobs use a special worker/backend setup and not the dependency itself?
Does it work for the first job in the direct chain?
Updated by tonyyuan over 2 years ago
We are running one, http://openqa.qam.suse.cz/tests/overview?build=%3Axen%3Ayoda%3A15sp4&distri=sle&version=15-SP3&groupid=156
I can reproduce this issue with every two directly chained jobs.
Updated by okurz over 2 years ago
- Category set to Regressions/Crashes
- Priority changed from Normal to Low
- Target version set to future
Hi, you could help us greatly by updating the ticket description with some details, following the template from https://progress.opensuse.org/projects/openqav3/wiki/#Defects
The build you referenced are all on "64bit-ipmi", right?
I saw that all tests were running on "void:14" so it could be a machine specific problem. Can you check on another machine?
Does this reproduce on qemu machines as well?
Updated by tonyyuan over 2 years ago
This is another running case on void:15
http://openqa.qam.suse.cz/tests/overview?build=%3Axen%3Azoe%3A15sp4&distri=sle&version=15-SP4&groupid=163
backend: 64bit-ipmi
It's reproducible on any machine as ipmi backend.
All machines we have are ipmi. I don't know if it's reproducible for qemu backend.
Any of directly chained jobs lost "Live view", from the first parent job to last child.
Updated by okurz over 2 years ago
Ok. But I don't think we will be able to efficiently help with this unless we can reproduce on qemu or know that it's not qemu reproducible
Updated by tonyyuan about 2 years ago
Yes, it's reproducible on qemu. I did a research in Hackweek and submitted a PR: https://github.com/os-autoinst/openQA/pull/4727
The root casue:
The commit below introduced a regression. live view and live log of directly chained jobs are not able to work due to this regression.
https://github.com/os-autoinst/openQA/commit/591fba9fe7948f963300ff66074c6dd22092f4f1
In lib/OpenQA/Scheduler/Model/Jobs.pm, the line 476 $job_data{$->id} = $->prepare_for_work($worker, \%worker_properties) for @$jobs; calls "prepare_for_work" multiples times . Each call will delete the worker tmp directory created by previous call. In the end, no tmp directory exits so live view and live log can't be generated.
Updated by tonyyuan about 2 years ago
Below is the fix:
The fix will still clean up the tmp directory created by previous schedule, create new tmp directory and assign it to hash %worker_properties{WORKER_TMPDIR} in "_assign_multiple_jobs_to_worker" function before looping prepare_for_work.
prepare_for_work will not delete the previous tmp directory and create new tmp directory if its parameter %worker_properties{WORKER_TMPDIR} has value.
Updated by livdywan about 2 years ago
- Status changed from New to Feedback
With the PR merged let's see how well this works in practice