action #57707

isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2b

Added by okurz 6 months ago. Updated 6 months ago.

Status:ResolvedStart date:04/10/2019
Priority:NormalDue date:
Assignee:mkittler% Done:

0%

Category:Concrete Bugs
Target version:Done
Difficulty:
Duration:

Description

Observation

I have seen the same error message when starting isotovideo locally as well as within openQA jobs. From a local run of isotovideo only:

[2019-10-04T14:55:04.861 CEST] [debug] done with autotest process
[2019-10-04T14:55:04.862 CEST] [debug] killing command server 10808 because test execution ended
[2019-10-04T14:55:04.862 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20013/3HbEX_vD1nTThFUd/broadcast
[2019-10-04T14:55:19.877 CEST] [debug] isotovideo: unable to inform websocket clients about stopping command server: Request timeout at /home/okurz/local/os-autoinst/os-autoinst/isotovideo line 175.

[2019-10-04T14:55:21.425 CEST] [debug] commands process exited: 0

See the (wasted) time of 17s due to the timeout. In this case there simply are no websocket clients to inform so the whole operation should not even be tried.


Related issues

Related to openQA Project - action #45191: developer mode: error message just when clicking "Cancel ... Resolved 22/11/2018
Copied to openQA Project - action #58379: isotovideo is slow to shutdown / error messages on proper... Blocked 04/10/2019 14/04/2020

History

#1 Updated by okurz 6 months ago

  • Related to action #45191: developer mode: error message just when clicking "Cancel job" added

#2 Updated by mkittler 6 months ago

This is actually only expected to fail and show this message when the job is aborted by the worker.

When the test execution just ends regularly the broadcast should work (and do nothing if there are no web socket clients). That it is not working here is a bug indeed. Maybe the command server is blocked in that moment waiting for a response from isotovideo? So isotovideo and the command server are waiting for each other?

#3 Updated by okurz 6 months ago

This should be trivial to reproduce and debug as it only involves isotovideo locally.

#4 Updated by mkittler 6 months ago

Since I've been dealing with that code today again it reminded me of this issue. I don't think it is easy to reproduce because it only happens sometimes, right?

To answer my own questions: The code takes care to prevent the deadlock situation I've described but it looks like that it wouldn't do that in all places.

#5 Updated by mkittler 6 months ago

Note that this log line can also simply be caused by the web UI (the liveviewhandler in particular) being unresponsive. Considering recent performance problems on OSD this might have even been the case. Then the problem is actually correctly logged at this place. We could lower the timeout, though. (If this request fails is not a big deal.)

#6 Updated by okurz 6 months ago

Please read what I wrote carefully. The issue is reproducible every time when calling only isotovideo locally, no web UI available or involved. This has nothing to do with OSD performance problems due to upload in chunks taking long because of a monitoring plugin.

#7 Updated by mkittler 6 months ago

  • Status changed from New to In Progress
  • Assignee set to mkittler
  • Target version set to Current Sprint

#8 Updated by okurz 6 months ago

  • Status changed from In Progress to Feedback

merged. So I checked locally with isotovideo (no openQA). I suggest you crosscheck openQA jobs on o3 after tomorrow morning's automatic deployment. Ok?

#9 Updated by mkittler 6 months ago

  • Status changed from Feedback to Resolved
  • Target version changed from Current Sprint to Done

It seems to work.

#10 Updated by okurz 6 months ago

well, yes, the original error is gone. However the shutdown is still far from speedy, e.g. see

[2019-10-17T23:10:23.857 CEST] [debug] no match: 79.0s, best candidate: grub2-2-20131030 (0.00)
^C[2019-10-17T23:10:24.058 CEST] [debug] signalhandler got INT
can_read received kill signal at ../../myjsonrpc.pm line 91.
[2019-10-17T23:10:24.059 CEST] [debug] killing backend process 10637
[2019-10-17T23:10:24.549 CEST] [debug] WARNING: check_asserted_screen took 0.68 seconds for 32 candidate needles - make your needles more specific
[2019-10-17T23:10:24.549 CEST] [debug] no match: 77.8s, best candidate: grub2-2-20131030 (0.00)
[2019-10-17T23:10:24.575 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 2
[2019-10-17T23:10:24.576 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-10-17T23:10:24.581 CEST] [debug] sending magic and exit
Unexpected end of data 0
[2019-10-17T23:10:26.062 CEST] [debug] done with backend process
[2019-10-17T23:10:26.062 CEST] [debug] killing command server 10627 because test execution ended through exception
[2019-10-17T23:10:28.063 CEST] [debug] done with command server
[2019-10-17T23:10:28.063 CEST] [debug] killing autotest process 10630
[2019-10-17T23:10:30.063 CEST] [debug] done with autotest process
10623: EXIT 0

6s just to kill some processes. I think we can still improve this. Also the "Unexpected end of data 0" isn't that nice. The "end of data" should very much be expected when the processes are asked to shutdown :) If you prefer we can handle this in a new ticket though.

#11 Updated by mkittler 6 months ago

Yes, it is a different delay so better make a new ticket.

#12 Updated by okurz 6 months ago

  • Copied to action #58379: isotovideo is slow to shutdown / error messages on proper shutdown added

Also available in: Atom PDF