Project

General

Profile

action #58379

isotovideo is slow to shutdown / error messages on proper shutdown

Added by okurz about 2 years ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Low
Assignee:
Category:
Feature requests
Target version:
-
Start date:
2019-10-04
Due date:
2020-04-14
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

The isotovideo shutdown is far from speedy, e.g. see

[2019-10-17T23:10:23.857 CEST] [debug] no match: 79.0s, best candidate: grub2-2-20131030 (0.00)
^C[2019-10-17T23:10:24.058 CEST] [debug] signalhandler got INT
can_read received kill signal at ../../myjsonrpc.pm line 91.
[2019-10-17T23:10:24.059 CEST] [debug] killing backend process 10637
[2019-10-17T23:10:24.549 CEST] [debug] WARNING: check_asserted_screen took 0.68 seconds for 32 candidate needles - make your needles more specific
[2019-10-17T23:10:24.549 CEST] [debug] no match: 77.8s, best candidate: grub2-2-20131030 (0.00)
[2019-10-17T23:10:24.575 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 2
[2019-10-17T23:10:24.576 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-10-17T23:10:24.581 CEST] [debug] sending magic and exit
Unexpected end of data 0
[2019-10-17T23:10:26.062 CEST] [debug] done with backend process
[2019-10-17T23:10:26.062 CEST] [debug] killing command server 10627 because test execution ended through exception
[2019-10-17T23:10:28.063 CEST] [debug] done with command server
[2019-10-17T23:10:28.063 CEST] [debug] killing autotest process 10630
[2019-10-17T23:10:30.063 CEST] [debug] done with autotest process
10623: EXIT 0

This is from a manual isotovideo run.

6s just to kill some processes. I think we can still improve this. Also the "Unexpected end of data 0" isn't that nice. The "end of data" should very much be expected when the processes are asked to shutdown :)


Related issues

Related to openQA Project - action #53999: openqa-worker (isotovideo) dumps core / segfaults quite often on several workers and distributions for cancelled jobsWorkable2019-07-09

Related to openQA Project - coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasonsResolved2020-04-012020-09-30

Copied from openQA Project - action #57707: isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2bResolved2019-10-04

History

#1 Updated by okurz about 2 years ago

  • Copied from action #57707: isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2b added

#2 Updated by coolo about 2 years ago

  • Priority changed from Normal to Low

#3 Updated by okurz about 2 years ago

  • Status changed from New to In Progress
  • Assignee set to okurz

Working on improving the output, e.g. in https://github.com/os-autoinst/os-autoinst/pull/1277 . Also I think the upstream project Mojo::IOLoop::ReadWriteProcess has a problem so I created a patch: https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/6

#4 Updated by okurz almost 2 years ago

  • Related to action #53999: openqa-worker (isotovideo) dumps core / segfaults quite often on several workers and distributions for cancelled jobs added

#5 Updated by cdywan almost 2 years ago

  • Target version set to Current Sprint

#7 Updated by okurz almost 2 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (okurz)
  • Target version deleted (Current Sprint)

#8 Updated by okurz over 1 year ago

  • Status changed from Workable to Feedback
  • Assignee set to okurz

Actually had an idea what I could do and managed to update https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/6 which I consider mergeable now.

https://github.com/os-autoinst/os-autoinst/pull/1360 for a further improvement in the isotovideo shutdown process.

#9 Updated by okurz over 1 year ago

After https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/6 was merged there were some flaky test failures so I created https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/7 but still flaky test failures in master branch so now I created https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/8. Waiting for test results, merge, release of a new version.

#10 Updated by okurz over 1 year ago

  • Due date set to 2020-04-14
  • Status changed from Feedback to Blocked

waiting for merge of https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/8 . Will wait some time until I will kindly remind.

#11 Updated by okurz over 1 year ago

  • Status changed from Blocked to Resolved

PR was merged and a new release 0.25 was published which is by now also published as packages within openSUSE:Factory as well as our derived branch projects in devel:openQA. Also deployed by now on osd and o3 workers. No problems reported. Taking any example job, e.g. https://openqa.suse.de/tests/4125075/file/autoinst-log.txt I can find:

[2020-04-15T14:54:13.264 CEST] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":657695,"seconds":1586955252}}
[2020-04-15T14:54:13.267 CEST] [debug] ||| finished shutdown_ltp kernel at 2020-04-15 12:54:13 (35 s)
[2020-04-15T14:54:13.269 CEST] [debug] stopping autotest process 18989
[2020-04-15T14:54:13.276 CEST] [debug] [autotest] process exited: 0
[2020-04-15T14:54:13.276 CEST] [debug] done with autotest process
[2020-04-15T14:54:13.276 CEST] [debug] terminating command server 18898 because test execution ended
[2020-04-15T14:54:13.276 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20283/ttVvf5JAFkA9USqT/broadcast
[2020-04-15T14:54:13.286 CEST] [debug] commands process exited: 0
[2020-04-15T14:54:13.287 CEST] [debug] done with command server
[2020-04-15T14:54:13.287 CEST] [debug] isotovideo done
[2020-04-15T14:54:13.287 CEST] [debug] BACKEND SHUTDOWN 1
[2020-04-15T14:54:13.288 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-04-15T14:54:14.317 CEST] [debug] flushing frames
[2020-04-15T14:54:14.352 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 19005 (/usr/bin/isotovideo: backen)
[2020-04-15T14:54:14.353 CEST] [debug] sending magic and exit
[2020-04-15T14:54:14.353 CEST] [debug] received magic close
[2020-04-15T14:54:14.375 CEST] [debug] backend process exited: 0
[2020-04-15T14:54:14.376 CEST] [debug] stopping backend process 19005
[2020-04-15T14:54:14.376 CEST] [debug] done with backend process
18895: EXIT 0
[2020-04-15T14:54:14.0451 CEST] [info] [pid:25970] Isotovideo exit status: 0

That is a shutdown within 800ms compared to previously about 6s \o/

#12 Updated by okurz over 1 year ago

  • Related to coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons added

Also available in: Atom PDF