Actions
action #37785
closed[functional][s390x][u] test fails in start_install - maybe disable stall detection?
Start date:
2018-06-25
Due date:
% Done:
0%
Estimated time:
Difficulty:
Description
Observation¶
openQA test in scenario sle-12-SP4-Server-DVD-s390x-xfs@s390x-kvm-sle12 fails in
start_install.
Hypothesis¶
From the logs it looks like the stall-detection is kicking in before the waiting needle has a chance to match.
The "stall" seems to happen if the progress bar does not update after more then 4.26s. This hypothesis is based on what I could see in the logs:
2018-06-21T20:03:02.0931 CEST] [debug] MATCH(rebootnow-20131217:0.00)
[2018-06-21T20:03:03.0004 CEST] [debug] MATCH(rebootnow-20150409:0.00)
[2018-06-21T20:03:03.0077 CEST] [debug] MATCH(rebootnow-20160504:0.64)
[2018-06-21T20:03:03.0151 CEST] [debug] MATCH(rebootnow-390x-20150709:0.26)
[2018-06-21T20:03:03.0216 CEST] [debug] MATCH(rebootnow-390x-20160506:0.00)
[2018-06-21T20:03:03.0348 CEST] [debug] MATCH(install_and_reboot-additional-packages-20170823:0.09)
[2018-06-21T20:03:03.0352 CEST] [debug] no match: 1675.0s
[2018-06-21T20:03:03.0352 CEST] [debug] considering VNC stalled, no update for 4.26 seconds
[2018-06-21T20:03:05.0969 CEST] [debug] GET "/7taHN1Dnqxy0gF22/isotovideo/status"
[2018-06-21T20:03:05.0970 CEST] [debug] Routing to a callback
DIE Error connecting to host <10.161.145.14>: IO::Socket::INET: connect: Connection timed out
at /usr/lib/os-autoinst/backend/baseclass.pm line 80.
backend::baseclass::die_handler('OpenQA::Exception::VNCSetupError=HASH(0x5ecd188)') called at /usr/lib/perl5/vendor_perl/5.18.2/Exception/Class/Base.pm line 85
Exception::Class::Base::throw('OpenQA::Exception::VNCSetupError', 'error', 'Error connecting to host <10.161.145.14>: IO::Socket::INET: c...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 151
consoles::VNC::login('consoles::VNC=HASH(0x5ed0520)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 842
consoles::VNC::send_update_request('consoles::VNC=HASH(0x5ed0520)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 82
consoles::vnc_base::request_screen_update('consoles::vnc_base=HASH(0x467e358)', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 587
backend::baseclass::bouncer('backend::svirt=HASH(0x6d7cc18)', 'request_screen_update', undef) called at /usr/lib/os-autoinst/backend/baseclass.pm line 570
backend::baseclass::request_screen_update('backend::svirt=HASH(0x6d7cc18)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 177
eval {...} called at /usr/lib/os-autoinst/backend/baseclass.pm line 156
backend::baseclass::run_capture_loop('backend::svirt=HASH(0x6d7cc18)') called at /usr/lib/os-autoinst/backend/baseclass.pm line 129
backend::baseclass::run('backend::svirt=HASH(0x6d7cc18)', 5, 8) called at /usr/lib/os-autoinst/backend/driver.pm line 85
backend::driver::start('backend::driver=HASH(0x5cc89d8)') called at /usr/lib/os-autoinst/backend/driver.pm line 48
backend::driver::new('backend::driver', 'svirt') called at /usr/bin/isotovideo line 236
main::init_backend() called at /usr/bin/isotovideo line 305
[2018-06-21T20:05:10.0632 CEST] [debug] Destroying openQA-SUT-2 virtual machine
[2018-06-21T20:05:10.0703 CEST] [debug] Connection to root@s390p8.suse.de established
[2018-06-21T20:05:11.0259 CEST] [debug] Command's stdout:
Domain openQA-SUT-2 destroyed
But take it with a grain of salt:
16:39 <nsinger> foursixnine: https://openqa.suse.de/tests/1777076/file/autoinst-log.txt does the "connection timed out" means that the stall-detection kicked in?
16:39 <nsinger> or is it not directly correlated?
16:43 <foursixnine> nsinger: I wouldn't put my life on it, but looks like
16:43 <foursixnine> that part of the code tries to reconnect
16:44 <nsinger> I'm just curious if we may need to disable the stall-detection here since the needle-match timeout still has 1675.0s left at that time
So maybe disabling the stall detection already helps to circumvent this issue.
Reproducible¶
Fails since (at least) Build 0263 (current job)
Expected result¶
Last good: 0262 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz about 6 years ago
- Subject changed from [functional][s390x][u][fast] test fails in start_install - maybe disable stall detection? to [functional][s390x][u] test fails in start_install - maybe disable stall detection?
- Target version set to future
Hm, why fast? I don't see it this way.
Updated by okurz over 5 years ago
- Related to action #52763: [functiona][y] test incompletes in start_install after 3h added
Updated by mgriessmeier over 4 years ago
- Status changed from New to Rejected
no latest present anymore, issue is addressed in many other tickets
Actions