action #58379
closedisotovideo is slow to shutdown / error messages on proper shutdown
0%
Description
Observation¶
The isotovideo shutdown is far from speedy, e.g. see
[2019-10-17T23:10:23.857 CEST] [debug] no match: 79.0s, best candidate: grub2-2-20131030 (0.00)
^C[2019-10-17T23:10:24.058 CEST] [debug] signalhandler got INT
can_read received kill signal at ../../myjsonrpc.pm line 91.
[2019-10-17T23:10:24.059 CEST] [debug] killing backend process 10637
[2019-10-17T23:10:24.549 CEST] [debug] WARNING: check_asserted_screen took 0.68 seconds for 32 candidate needles - make your needles more specific
[2019-10-17T23:10:24.549 CEST] [debug] no match: 77.8s, best candidate: grub2-2-20131030 (0.00)
[2019-10-17T23:10:24.575 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 2
[2019-10-17T23:10:24.576 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2019-10-17T23:10:24.581 CEST] [debug] sending magic and exit
Unexpected end of data 0
[2019-10-17T23:10:26.062 CEST] [debug] done with backend process
[2019-10-17T23:10:26.062 CEST] [debug] killing command server 10627 because test execution ended through exception
[2019-10-17T23:10:28.063 CEST] [debug] done with command server
[2019-10-17T23:10:28.063 CEST] [debug] killing autotest process 10630
[2019-10-17T23:10:30.063 CEST] [debug] done with autotest process
10623: EXIT 0
This is from a manual isotovideo run.
6s just to kill some processes. I think we can still improve this. Also the "Unexpected end of data 0" isn't that nice. The "end of data" should very much be expected when the processes are asked to shutdown :)
Updated by okurz about 5 years ago
- Copied from action #57707: isotovideo fails to terminate cleanly, message "isotovideo: unable to inform websocket clients about stopping command server: Request timeout", regression from 4cd4af2b added
Updated by okurz about 5 years ago
- Status changed from New to In Progress
- Assignee set to okurz
Working on improving the output, e.g. in https://github.com/os-autoinst/os-autoinst/pull/1277 . Also I think the upstream project Mojo::IOLoop::ReadWriteProcess has a problem so I created a patch: https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/6
Updated by okurz about 5 years ago
- Related to action #53999: openqa-worker (isotovideo) dumps core / segfaults quite often on several workers and distributions for cancelled jobs added
Updated by okurz almost 5 years ago
Updated by okurz almost 5 years ago
- Status changed from In Progress to Workable
- Assignee deleted (
okurz) - Target version deleted (
Current Sprint)
https://github.com/os-autoinst/os-autoinst/pull/1277 and https://github.com/os-autoinst/os-autoinst/pull/1349 are done. I don't have further good ideas right now :)
Updated by okurz almost 5 years ago
- Status changed from Workable to Feedback
- Assignee set to okurz
Actually had an idea what I could do and managed to update https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/6 which I consider mergeable now.
https://github.com/os-autoinst/os-autoinst/pull/1360 for a further improvement in the isotovideo shutdown process.
Updated by okurz over 4 years ago
After https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/6 was merged there were some flaky test failures so I created https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/7 but still flaky test failures in master branch so now I created https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/8. Waiting for test results, merge, release of a new version.
Updated by okurz over 4 years ago
- Due date set to 2020-04-14
- Status changed from Feedback to Blocked
waiting for merge of https://github.com/mudler/Mojo-IOLoop-ReadWriteProcess/pull/8 . Will wait some time until I will kindly remind.
Updated by okurz over 4 years ago
- Status changed from Blocked to Resolved
PR was merged and a new release 0.25 was published which is by now also published as packages within openSUSE:Factory as well as our derived branch projects in devel:openQA. Also deployed by now on osd and o3 workers. No problems reported. Taking any example job, e.g. https://openqa.suse.de/tests/4125075/file/autoinst-log.txt I can find:
[2020-04-15T14:54:13.264 CEST] [debug] EVENT {"event":"STOP","timestamp":{"microseconds":657695,"seconds":1586955252}}
[2020-04-15T14:54:13.267 CEST] [debug] ||| finished shutdown_ltp kernel at 2020-04-15 12:54:13 (35 s)
[2020-04-15T14:54:13.269 CEST] [debug] stopping autotest process 18989
[2020-04-15T14:54:13.276 CEST] [debug] [autotest] process exited: 0
[2020-04-15T14:54:13.276 CEST] [debug] done with autotest process
[2020-04-15T14:54:13.276 CEST] [debug] terminating command server 18898 because test execution ended
[2020-04-15T14:54:13.276 CEST] [debug] isotovideo: informing websocket clients before stopping command server: http://127.0.0.1:20283/ttVvf5JAFkA9USqT/broadcast
[2020-04-15T14:54:13.286 CEST] [debug] commands process exited: 0
[2020-04-15T14:54:13.287 CEST] [debug] done with command server
[2020-04-15T14:54:13.287 CEST] [debug] isotovideo done
[2020-04-15T14:54:13.287 CEST] [debug] BACKEND SHUTDOWN 1
[2020-04-15T14:54:13.288 CEST] [info] ::: OpenQA::Qemu::Proc::save_state: Saving QEMU state to qemu_state.json
[2020-04-15T14:54:14.317 CEST] [debug] flushing frames
[2020-04-15T14:54:14.352 CEST] [debug] QEMU: qemu-system-x86_64: terminating on signal 15 from pid 19005 (/usr/bin/isotovideo: backen)
[2020-04-15T14:54:14.353 CEST] [debug] sending magic and exit
[2020-04-15T14:54:14.353 CEST] [debug] received magic close
[2020-04-15T14:54:14.375 CEST] [debug] backend process exited: 0
[2020-04-15T14:54:14.376 CEST] [debug] stopping backend process 19005
[2020-04-15T14:54:14.376 CEST] [debug] done with backend process
18895: EXIT 0
[2020-04-15T14:54:14.0451 CEST] [info] [pid:25970] Isotovideo exit status: 0
That is a shutdown within 800ms compared to previously about 6s \o/
Updated by okurz over 4 years ago
- Related to coordination #65118: [epic] multimachine test fails with symptoms "websocket refusing connection" and other unclear reasons added