action #41015
closed
Don't use livehandler if no developer looks at it
Added by coolo over 6 years ago.
Updated over 6 years ago.
Category:
Regressions/Crashes
Description
Next morning, next outage :(
error_log was filling up with errors trying to access the live handler port
and it's no suprise as the live handler was dead (and we have no idea what's
blocking it):
openqa:/home/coolo # strace -p 28474 -f
Process 28474 attached
restart_syscall(<... resuming interrupted call ...>CProcess 28474 detached
Please make sure the live handler is only involved when jobs are monitored - that was the premise of this seperate service. It's not supposed to break nightly service - even if broken.
- Related to action #38510: Allow os-autoinst to pause on next assert_screen timeout added
https://github.com/os-autoinst/openQA/commit/7a97302b8a42dcaedfb34fd60a04efea0b08bc7c should prevent the immediate problem when the livehandler isn't reachable.
But yes, it would be nice if the worker would only post the upload progress if someone is watching the test. I could just use the existing has_logviewers
for this.
Only problem would be the following sequence of events:
- Nobody is watching the job (eg. the developer closed the tab).
- The job is paused due to assert_screen timeout.
- The developer opens the tab again. The upload progress hasn't been posted by the worker so the needle editor is not offered although the latest screenshot would be ready.
Not sure how to solve this in an elegant way. Actually I wanted to keep the worker as much out of it as possible. The problem is that the worker is responsible for uploading the test artifacts and hence only knows when the latest screenshot is ready.
One the other side, what would be the big benefit from saving that post call? It is only a small extra cost on top of uploading the artifacts. And now that should be actually true because shouldn't be endlessly trying the same post again and again in the error case.
Your commit does not limit the problem well enough - because you still pile up apache workers waiting for the backend to
be reachable.
And I don't care too much about developers closing tabs - as soon as one developer looked at it, it's fine to
use the live handler. But what we should avoid is jobs that are just the mass of jobs touch unnecessary parts.
- Has duplicate action #41042: [tools][osd] "isos post" from rsync.pl aborted with "Use of uninitialized value in concatenation (.) or string at /opt/openqa-scripts/rsync.pl line 998. error scheduling 502 Proxy Error" added
- Subject changed from livehandler is stuck to Don't use livehandler if no developer looks at it
- Target version changed from Ready to Current Sprint
The actual problem might be dup of another, but let's take this ticket to ease the load
- Status changed from New to In Progress
- Status changed from In Progress to Resolved
- Target version changed from Current Sprint to Done
Also available in: Atom
PDF