Project

General

Profile

Actions

action #19262

closed

[functional][s390] openQA fails to reconnect to SUT after reboot

Added by nicksinger over 7 years ago. Updated over 7 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2017-05-19
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-12-SP3-Server-DVD-s390x-btrfs@s390x-zVM-hsi-l2 fails in
install_and_reboot

Listening to a conversation between @mgriessmeier and @coolo it seems to happen because the SUT restarts to slowly / the network does not come up fast enough.
Most likely a increase in the timeout fixes this issue.

Reproducible

First occurrence and seems to be sporadic.
Fails since (at least) Build 0389 (current job)

Expected result

Last good: 0381 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Is duplicate of openQA Tests (public) - action #18890: [s390x][zVM]test fails to reconnect after installation -> incompleteResolvedokurz2017-05-01

Actions
Blocks openQA Tests (public) - action #16488: [sles][functional][tools][s390x] zVM: test fails in first_boot to reconnect to s390x hostResolvedokurz2017-02-06

Actions
Actions #1

Updated by okurz over 7 years ago

  • Is duplicate of action #18890: [s390x][zVM]test fails to reconnect after installation -> incomplete added
Actions #2

Updated by okurz over 7 years ago

In https://openqa.suse.de/tests/966730/file/autoinst-log.txt I see

05:40:23.1194 29771 considering VNC stalled, no update for 7.40 seconds
DIE socket does not exist. Probably your backend instance could not start or died. at /usr/lib/os-autoinst/consoles/VNC.pm line 881.

 at /usr/lib/os-autoinst/backend/baseclass.pm line 78.
    backend::baseclass::die_handler('socket does not exist. Probably your backend instance could n...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 801
    consoles::VNC::catch {...} ('socket does not exist. Probably your backend instance could n...') called at /usr/lib/perl5/vendor_perl/5.18.2/Try/Tiny.pm line 115
    Try::Tiny::try('CODE(0x68bcbf8)', 'Try::Tiny::Catch=REF(0x6b0d660)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 803
    consoles::VNC::update_framebuffer('consoles::VNC=HASH(0x6b01ca0)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 74
    consoles::vnc_base::request_screen_update('consoles::vnc_base=HASH(0x42799f8)', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 533
    backend::baseclass::bouncer('backend::s390x=HASH(0x5af8d80)', 'request_screen_update', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 516
    backend::baseclass::request_screen_update('backend::s390x=HASH(0x5af8d80)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 170
    eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 154
    backend::baseclass::run_capture_loop('backend::s390x=HASH(0x5af8d80)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 127
    backend::baseclass::run('backend::s390x=HASH(0x5af8d80)', 6, 9) called at /usr/lib/os-autoinst/backend/driver.pm line 85
    backend::driver::start('backend::driver=HASH(0x57d5270)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
    backend::driver::new('backend::driver', 's390x') called at /usr/bin/isotovideo line 206
    main::init_backend() called at /usr/bin/isotovideo line 271
last frame
05:40:34.2347 29771 sending magic and exit

and checking the source code of consoles/VNC.pm I see that update_framebuffer is called on the socket which has been closed by the stall detection in send_update_request so not a good idea. I think the stall detection is too aggressive in this case and I blame the magic number "4" here. Still, it's not a good idea to just close the socket while other methods (another thread?) access it.

@coolo what do you suggest, increase the "4" for this backend, check the socket existance and silently ignore and retry in update_framebuffer or something else?

Actions #3

Updated by nicksinger over 7 years ago

  • Priority changed from Normal to High

It happened here again: https://openqa.suse.de/tests/975846 with pretty much the same log as @okurz already pasted.
I also change this to high since its open for quiet a while and influences our existing tests.

Actions #4

Updated by okurz over 7 years ago

  • Status changed from New to Feedback
  • Assignee set to okurz
Actions #5

Updated by okurz over 7 years ago

  • Blocks action #16488: [sles][functional][tools][s390x] zVM: test fails in first_boot to reconnect to s390x host added
Actions #6

Updated by okurz over 7 years ago

First, I could easily reproduce this on lord.arch: http://lord.arch/tests/6511/file/autoinst-log.txt

Now with the fix: http://lord.arch/tests?match=poo19262_now_with_pr810

-> http://lord.arch/tests/6557#step/reconnect_s390/4 shows how the incomplete turned into a fail.

The error itself that the s390x instance does not boot properly should be tracked in #16488

Actions #7

Updated by coolo over 7 years ago

Your PR is wrong - and so is the test. It should disable the VNC console when triggering reboot.

Actions #8

Updated by okurz over 7 years ago

coolo wrote:

Your PR is wrong - and so is the test. It should disable the VNC console when triggering reboot.

As I said: The error itself that the s390x instance does not boot properly should be tracked in #16488. What I was changing is the backend code which caused undefined perl variables to be caused. Can you imagine why the test code was not changed to disable the vnc console in the whole time? Maybe readers of the test results don't assume a problem in the test code when the backend crashes?

Actions #10

Updated by okurz over 7 years ago

merged, let's see if https://openqa.suse.de/tests/985241 will pass

EDIT: It did. I assume we are done here but we don't know for sure as the issue was sporadic. Please reopen if seen again.

Actions #11

Updated by okurz over 7 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF