action #41015: Don't use livehandler if no developer looks at it - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #41015

closed

Don't use livehandler if no developer looks at it

Added by coolo over 6 years ago. Updated over 6 years ago.

Status:

Resolved

Priority:

Urgent

Assignee:

mkittler

Category:

Regressions/Crashes

Target version:

Done

Start date:

2018-09-14

Due date:

% Done:

Estimated time:

Description

Next morning, next outage :(

error_log was filling up with errors trying to access the live handler port
and it's no suprise as the live handler was dead (and we have no idea what's
blocking it):

openqa:/home/coolo # strace -p 28474 -f
Process 28474 attached
restart_syscall(<... resuming interrupted call ...>^CProcess 28474 detached

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by coolo over 6 years ago

Assignee set to mkittler

Please make sure the live handler is only involved when jobs are monitored - that was the premise of this seperate service. It's not supposed to break nightly service - even if broken.

Actions

Copy link

Updated by coolo over 6 years ago

Related to action #38510: Allow os-autoinst to pause on next assert_screen timeout added

Actions

Copy link

Updated by mkittler over 6 years ago

https://github.com/os-autoinst/openQA/commit/7a97302b8a42dcaedfb34fd60a04efea0b08bc7c should prevent the immediate problem when the livehandler isn't reachable.

But yes, it would be nice if the worker would only post the upload progress if someone is watching the test. I could just use the existing has_logviewers for this.

Only problem would be the following sequence of events:

Nobody is watching the job (eg. the developer closed the tab).
The job is paused due to assert_screen timeout.
The developer opens the tab again. The upload progress hasn't been posted by the worker so the needle editor is not offered although the latest screenshot would be ready.

Not sure how to solve this in an elegant way. Actually I wanted to keep the worker as much out of it as possible. The problem is that the worker is responsible for uploading the test artifacts and hence only knows when the latest screenshot is ready.

One the other side, what would be the big benefit from saving that post call? It is only a small extra cost on top of uploading the artifacts. And now that should be actually true because shouldn't be endlessly trying the same post again and again in the error case.

Actions

Copy link

Updated by coolo over 6 years ago

Your commit does not limit the problem well enough - because you still pile up apache workers waiting for the backend to
be reachable.

And I don't care too much about developers closing tabs - as soon as one developer looked at it, it's fine to
use the live handler. But what we should avoid is jobs that are just the mass of jobs touch unnecessary parts.

Actions

Copy link

Updated by coolo over 6 years ago

Has duplicate action #41042: [tools][osd] "isos post" from rsync.pl aborted with "Use of uninitialized value in concatenation (.) or string at /opt/openqa-scripts/rsync.pl line 998. error scheduling 502 Proxy Error" added

Actions

Copy link

Updated by coolo over 6 years ago

Subject changed from livehandler is stuck to Don't use livehandler if no developer looks at it
Target version changed from Ready to Current Sprint

The actual problem might be dup of another, but let's take this ticket to ease the load

Actions

Copy link