Project

General

Profile

action #110824

The live view does not work for directly chained jobs

Added by tonyyuan 7 months ago. Updated 5 months ago.

Status:
Feedback
Priority:
Low
Assignee:
-
Category:
Concrete Bugs
Target version:
Start date:
2022-05-10
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

The live view does not work if starting jobs with START_DIRECTLY_AFTER_TEST. "live view" and "live log" of running job show nothing. The web browser console can get some messages.

History

#1 Updated by mkittler 7 months ago

Can you give some example jobs? Maybe it is just because those jobs use a special worker/backend setup and not the dependency itself?

Does it work for the first job in the direct chain?

#2 Updated by tonyyuan 7 months ago

We are running one, http://openqa.qam.suse.cz/tests/overview?build=%3Axen%3Ayoda%3A15sp4&distri=sle&version=15-SP3&groupid=156

I can reproduce this issue with every two directly chained jobs.

#3 Updated by okurz 7 months ago

  • Category set to Concrete Bugs
  • Priority changed from Normal to Low
  • Target version set to future

Hi, you could help us greatly by updating the ticket description with some details, following the template from https://progress.opensuse.org/projects/openqav3/wiki/#Defects

The build you referenced are all on "64bit-ipmi", right?
I saw that all tests were running on "void:14" so it could be a machine specific problem. Can you check on another machine?
Does this reproduce on qemu machines as well?

#4 Updated by tonyyuan 7 months ago

This is another running case on void:15
http://openqa.qam.suse.cz/tests/overview?build=%3Axen%3Azoe%3A15sp4&distri=sle&version=15-SP4&groupid=163
backend: 64bit-ipmi

It's reproducible on any machine as ipmi backend.
All machines we have are ipmi. I don't know if it's reproducible for qemu backend.

Any of directly chained jobs lost "Live view", from the first parent job to last child.

#5 Updated by okurz 7 months ago

Ok. But I don't think we will be able to efficiently help with this unless we can reproduce on qemu or know that it's not qemu reproducible

#6 Updated by tonyyuan 5 months ago

Yes, it's reproducible on qemu. I did a research in Hackweek and submitted a PR: https://github.com/os-autoinst/openQA/pull/4727

The root casue:

The commit below introduced a regression. live view and live log of directly chained jobs are not able to work due to this regression.
https://github.com/os-autoinst/openQA/commit/591fba9fe7948f963300ff66074c6dd22092f4f1
In lib/OpenQA/Scheduler/Model/Jobs.pm, the line 476 $job_data{$->id} = $->prepare_for_work($worker, \%worker_properties) for @$jobs; calls "prepare_for_work" multiples times . Each call will delete the worker tmp directory created by previous call. In the end, no tmp directory exits so live view and live log can't be generated.

#7 Updated by tonyyuan 5 months ago

Below is the fix:

The fix will still clean up the tmp directory created by previous schedule, create new tmp directory and assign it to hash %worker_properties{WORKER_TMPDIR} in "_assign_multiple_jobs_to_worker" function before looping prepare_for_work.
prepare_for_work will not delete the previous tmp directory and create new tmp directory if its parameter %worker_properties{WORKER_TMPDIR} has value.

#8 Updated by cdywan 5 months ago

  • Status changed from New to Feedback

With the PR merged let's see how well this works in practice

Also available in: Atom PDF