action #19262
closed[functional][s390] openQA fails to reconnect to SUT after reboot
0%
Description
Observation¶
openQA test in scenario sle-12-SP3-Server-DVD-s390x-btrfs@s390x-zVM-hsi-l2 fails in
install_and_reboot
Listening to a conversation between @mgriessmeier and @coolo it seems to happen because the SUT restarts to slowly / the network does not come up fast enough.
Most likely a increase in the timeout fixes this issue.
Reproducible¶
First occurrence and seems to be sporadic.
Fails since (at least) Build 0389 (current job)
Expected result¶
Last good: 0381 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz over 7 years ago
- Is duplicate of action #18890: [s390x][zVM]test fails to reconnect after installation -> incomplete added
Updated by okurz over 7 years ago
In https://openqa.suse.de/tests/966730/file/autoinst-log.txt I see
05:40:23.1194 29771 considering VNC stalled, no update for 7.40 seconds
DIE socket does not exist. Probably your backend instance could not start or died. at /usr/lib/os-autoinst/consoles/VNC.pm line 881.
at /usr/lib/os-autoinst/backend/baseclass.pm line 78.
backend::baseclass::die_handler('socket does not exist. Probably your backend instance could n...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 801
consoles::VNC::catch {...} ('socket does not exist. Probably your backend instance could n...') called at /usr/lib/perl5/vendor_perl/5.18.2/Try/Tiny.pm line 115
Try::Tiny::try('CODE(0x68bcbf8)', 'Try::Tiny::Catch=REF(0x6b0d660)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 803
consoles::VNC::update_framebuffer('consoles::VNC=HASH(0x6b01ca0)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 74
consoles::vnc_base::request_screen_update('consoles::vnc_base=HASH(0x42799f8)', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 533
backend::baseclass::bouncer('backend::s390x=HASH(0x5af8d80)', 'request_screen_update', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 516
backend::baseclass::request_screen_update('backend::s390x=HASH(0x5af8d80)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 170
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 154
backend::baseclass::run_capture_loop('backend::s390x=HASH(0x5af8d80)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 127
backend::baseclass::run('backend::s390x=HASH(0x5af8d80)', 6, 9) called at /usr/lib/os-autoinst/backend/driver.pm line 85
backend::driver::start('backend::driver=HASH(0x57d5270)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
backend::driver::new('backend::driver', 's390x') called at /usr/bin/isotovideo line 206
main::init_backend() called at /usr/bin/isotovideo line 271
last frame
05:40:34.2347 29771 sending magic and exit
and checking the source code of consoles/VNC.pm I see that update_framebuffer
is called on the socket which has been closed by the stall detection in send_update_request
so not a good idea. I think the stall detection is too aggressive in this case and I blame the magic number "4" here. Still, it's not a good idea to just close the socket while other methods (another thread?) access it.
@coolo what do you suggest, increase the "4" for this backend, check the socket existance and silently ignore and retry in update_framebuffer
or something else?
Updated by nicksinger over 7 years ago
- Priority changed from Normal to High
It happened here again: https://openqa.suse.de/tests/975846 with pretty much the same log as @okurz already pasted.
I also change this to high since its open for quiet a while and influences our existing tests.
Updated by okurz over 7 years ago
- Status changed from New to Feedback
- Assignee set to okurz
again in https://openqa.suse.de/tests/978791
I am trying with https://github.com/os-autoinst/os-autoinst/pull/810
Updated by okurz over 7 years ago
- Blocks action #16488: [sles][functional][tools][s390x] zVM: test fails in first_boot to reconnect to s390x host added
Updated by okurz over 7 years ago
First, I could easily reproduce this on lord.arch: http://lord.arch/tests/6511/file/autoinst-log.txt
Now with the fix: http://lord.arch/tests?match=poo19262_now_with_pr810
-> http://lord.arch/tests/6557#step/reconnect_s390/4 shows how the incomplete turned into a fail.
The error itself that the s390x instance does not boot properly should be tracked in #16488
Updated by coolo over 7 years ago
Your PR is wrong - and so is the test. It should disable the VNC console when triggering reboot.
Updated by okurz over 7 years ago
coolo wrote:
Your PR is wrong - and so is the test. It should disable the VNC console when triggering reboot.
As I said: The error itself that the s390x instance does not boot properly should be tracked in #16488. What I was changing is the backend code which caused undefined perl variables to be caused. Can you imagine why the test code was not changed to disable the vnc console in the whole time? Maybe readers of the test results don't assume a problem in the test code when the backend crashes?
Updated by okurz over 7 years ago
merged, let's see if https://openqa.suse.de/tests/985241 will pass
EDIT: It did. I assume we are done here but we don't know for sure as the issue was sporadic. Please reopen if seen again.