Project

General

Profile

Actions

action #113318

closed

openQA live view stays blank when browser tab is staying open on scheduled jobs until jobs start size:M

Added by okurz over 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-07-06
Due date:
2023-04-11
% Done:

0%

Estimated time:

Description

Observation

On any job (at least observed on openqa.opensuse.org) when opening the job while the job is still in scheduled state, when the job switches to "running" the live view stays blank (white rectangle with no content) like this:

Screenshot_20220706_145940_openQA_running_job_stays_white_when_staying_open_since_scheduled.png

Expected result

Live view should show actual content like this from the beginning:

Screenshot_20220706_150136_openQA_live_view_showing_content_expected_result.png

Workaround

Two alternatives:

  • Force a refresh of the browser window with "F5"
  • Open the job result only after the job started

Suggestions

  • Check whether works before https://github.com/os-autoinst/openQA/pull/4727
  • Tina/Marius observed a live view not updating an already running job, which might relate to this issue (also on o3)
  • Try to reproduce it locally
  • Investigate the worker's pool directory while the job is running

Files

Actions #1

Updated by livdywan over 2 years ago

  • Subject changed from openQA live view stays blank when browser tab is staying open on scheduled jobs until jobs start to openQA live view stays blank when browser tab is staying open on scheduled jobs until jobs start size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #2

Updated by tinita over 2 years ago

Just for the record, I don't see the problem locally or on osd, only on o3.
Can't see anything suspicious in the developer tools console / network log.

Actions #3

Updated by mkittler over 2 years ago

  • Assignee set to mkittler
Actions #4

Updated by mkittler over 2 years ago

I cannot reproduce the problem Tina and me saw on o3 with any running job on o3. Note that this problem was also happening with currently running jobs, e.g. Tina sent me the link and the updates also got stuck when I opened the running job on my machine. Currently o3 isn't busy at all.

The workaround "Open the job result only after the job started" implies that there's another case when it is only broken if the job wasn't running when opening the tab. I could also not reproduce that case. Neither on o3 nor locally. Note that it can take a while until the test actually gets started. So a test still shows up as running when the worker downloads assets, isotovideo is launched, needles are initialized and the test schedule is created. That can take a while. Maybe it was just that and the reload was just done shortly before the test was actually starting so it only looked like it helped?

The workaround "Force a refresh of the browser window with "F5"" implies that refreshing would help. That wasn't really the case on the job Tina and me observed. It helped but shortly the job got stuck again.

I'll check again next week when o3 is more busy. Maybe that makes a difference. Once I find a job I'll investigate the workers pool directory and logs.


I've also found a job where the video was disabled (due to max. job time). In this case one also doesn't get a screen. I suppose that was not the case here. I'm just mentioning it because at a first look such a job looks like it would reproduce some of the mentioned issues.

Actions #5

Updated by mkittler over 2 years ago

Anybody, feel free to grab this ticket if you can reproduce it. I've opened a few tabs but so far haven't reproduced it.

Actions #6

Updated by okurz over 2 years ago

I could still reproduce it as of today. Do you think it has something to do with my local environment, e.g. Firefox on Leap 15.3 x86_64 or something?

Actions #7

Updated by mkittler about 2 years ago

I don't think it is the Firefox version. Since this is about the image and likely also the log I don't think the relevant code requires any too new JavaScript features your possibly outdated browser doesn't have.

If you can reproduce it locally, you could open the developer tools in the tab and check whether anything shows up in the JavaScript console and network log when the job state switches to running.

Actions #8

Updated by okurz about 2 years ago

  • Assignee deleted (mkittler)
  • Target version changed from Ready to future

We have to focus on other urgent tasks with reduced capacity so moving this out of backlog

Actions #9

Updated by okurz over 1 year ago

  • Target version changed from future to Ready
Actions #10

Updated by okurz over 1 year ago

Looked into this with mkittler:

  1. mkittler and me can reproduce the problem on o3 100%
  2. As soon as a second browser instance e.g. a second user opens the job details page of the same job that fixes the refresh for the first user
  3. In browser development tools resending the "streaming" request fixes the refresh for the image
  4. The problem can also be reproduced by circumventing the apache webproxy on o3 with an ssh bridge ssh -NT -L 9526:localhost:9526 o3
Actions #11

Updated by mkittler over 1 year ago

  • Assignee set to mkittler

Additionally:

  1. The developer mode works and shows the current module. This way we can be sure that the test is really running at this point and the image/log should be there.
  2. I've also tried to reproduce the problem locally again. I could not reproduce it; also not when using apache2 in the same way we do on o3. This, together with point 4. from @okurz's comment means that the reverse proxy is likely not the problem.
  3. The problem could also be reproduced on OSD. I haven't started a job explicitly but just accessed one that hasn't started any module yet. I tried this two times and could always reproduce the problem.
  4. I also ran into the JavaScript code getting stuck and created a fix: https://github.com/os-autoinst/openQA/pull/5058

So the problem that I cannot reproduce it locally remains. However, the problem is now more clear so I'll try to investigate it further.

Actions #12

Updated by mkittler over 1 year ago

  • Status changed from Workable to In Progress

I can now reproduce it locally by inserting:

        log_debug('Faking setup status');
        return $callback->({error => 'Status updates interrupted'}, undef) unless $job->post_setup_status;
        log_debug('Faking delay');
        return Mojo::IOLoop->timer(30, sub {
            log_debug('Delay end');
            return _engine_workit_step_2($job, $job_settings, \%vars, undef, $callback);
        });

after

my $error = locate_local_assets(\%vars, $assetkeys, $pooldir);
return $callback->($error) if $error;

(Just inserting the delay via a plain sleep doesn't work because then the even loop is blocked and the command to enable the livelog is only handled after the delay and thus works nevertheless because then the backend has already been started.)

The problem is that the command for starting the live log is not doing anything unless the backend has already been started. However, simply adding the live_log file manually doesn't work so there's more to it.

Actions #13

Updated by openqa_review over 1 year ago

  • Due date set to 2023-04-11

Setting due date based on mean cycle time of SUSE QE Tools

Actions #14

Updated by mkittler over 1 year ago

PR which should fix the problem (at least it does when provoking the problem as mentioned before): https://github.com/os-autoinst/openQA/pull/5060

Actions #15

Updated by livdywan over 1 year ago

  • Status changed from In Progress to Feedback

mkittler wrote:

PR which should fix the problem (at least it does when provoking the problem as mentioned before): https://github.com/os-autoinst/openQA/pull/5060

Actions #16

Updated by okurz over 1 year ago

  • Status changed from Feedback to Resolved

After fixing the project config in devel:openQA eventually updated packages were built and installed on o3 workers. I could verify the fix there. Good work!

Actions #17

Updated by mkittler over 1 year ago

It has been deployed on o3 and works.

Actions

Also available in: Atom PDF