action #41015

Don't use livehandler if no developer looks at it

Added by coolo over 1 year ago. Updated over 1 year ago.

Status:ResolvedStart date:14/09/2018
Priority:UrgentDue date:
Assignee:mkittler% Done:

0%

Category:Concrete Bugs
Target version:Done
Difficulty:
Duration:

Description

Next morning, next outage :(

error_log was filling up with errors trying to access the live handler port
and it's no suprise as the live handler was dead (and we have no idea what's
blocking it):

openqa:/home/coolo # strace -p 28474 -f
Process 28474 attached
restart_syscall(<... resuming interrupted call ...>CProcess 28474 detached


Related issues

Related to openQA Project - action #38510: Allow os-autoinst to pause on next assert_screen timeout Resolved 18/07/2018
Duplicated by openQA Project - action #41042: [tools][osd] "isos post" from rsync.pl aborted with "Use ... Resolved 14/09/2018

History

#1 Updated by coolo over 1 year ago

  • Assignee set to mkittler

Please make sure the live handler is only involved when jobs are monitored - that was the premise of this seperate service. It's not supposed to break nightly service - even if broken.

#2 Updated by coolo over 1 year ago

  • Related to action #38510: Allow os-autoinst to pause on next assert_screen timeout added

#3 Updated by mkittler over 1 year ago

https://github.com/os-autoinst/openQA/commit/7a97302b8a42dcaedfb34fd60a04efea0b08bc7c should prevent the immediate problem when the livehandler isn't reachable.

But yes, it would be nice if the worker would only post the upload progress if someone is watching the test. I could just use the existing has_logviewers for this.

Only problem would be the following sequence of events:

  1. Nobody is watching the job (eg. the developer closed the tab).
  2. The job is paused due to assert_screen timeout.
  3. The developer opens the tab again. The upload progress hasn't been posted by the worker so the needle editor is not offered although the latest screenshot would be ready.

Not sure how to solve this in an elegant way. Actually I wanted to keep the worker as much out of it as possible. The problem is that the worker is responsible for uploading the test artifacts and hence only knows when the latest screenshot is ready.

One the other side, what would be the big benefit from saving that post call? It is only a small extra cost on top of uploading the artifacts. And now that should be actually true because shouldn't be endlessly trying the same post again and again in the error case.

#4 Updated by coolo over 1 year ago

Your commit does not limit the problem well enough - because you still pile up apache workers waiting for the backend to
be reachable.

And I don't care too much about developers closing tabs - as soon as one developer looked at it, it's fine to
use the live handler. But what we should avoid is jobs that are just the mass of jobs touch unnecessary parts.

#5 Updated by coolo over 1 year ago

  • Duplicated by action #41042: [tools][osd] "isos post" from rsync.pl aborted with "Use of uninitialized value in concatenation (.) or string at /opt/openqa-scripts/rsync.pl line 998. error scheduling 502 Proxy Error" added

#6 Updated by coolo over 1 year ago

  • Subject changed from livehandler is stuck to Don't use livehandler if no developer looks at it
  • Target version changed from Ready to Current Sprint

The actual problem might be dup of another, but let's take this ticket to ease the load

#7 Updated by mkittler over 1 year ago

  • Status changed from New to In Progress

PR for sending the updates only if a developer session has been opened: https://github.com/os-autoinst/openQA/pull/1789

#8 Updated by coolo over 1 year ago

  • Status changed from In Progress to Resolved

merged and deployed

#9 Updated by szarate over 1 year ago

  • Target version changed from Current Sprint to Done

Also available in: Atom PDF