action #31543
closed[sles][functional][tools][s390x][ipmi][hard][sporadic] test incompletes - "DIE The console isn't responding correctly. Maybe half-open socket?"
0%
Description
Observation¶
openQA test in scenario sle-15-Installer-DVD-s390x-allpatterns@s390x-kvm-sle12 fails in
install_and_reboot
- all tests failing are s390x-kvm-sle12 (s390p8 LPAR), though not all jobs on this LPAR are failing
- all those jobs are running on openqaw2
- there were no recent changes in os-autoinst or the tests regarding this (dasantiago confirmed in irc)
(maybe) related PRs:
https://github.com/os-autoinst/os-autoinst/pull/906
https://github.com/os-autoinst/os-autoinst/pull/902
Tasks¶
- Gather statistics about how often this happens
- Check if we can handle the root cause of these "debug-messages"
Acceptance Criteria¶
one of those should be done:
- AC1: Turn the incomplete into a fail with a proper message, understandable by everyone
- AC2: Find the root-cause of this and come up with a fix
Workaround¶
- see michals suggestion: https://github.com/os-autoinst/os-autoinst/pull/906/files#diff-333bbcc7c9ce8c440b7c87218c426f42R15.
Reproducible¶
Fails since (at least) Build 408.1
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by mgriessmeier almost 7 years ago
- Related to action #30216: [sles][virtualization][xen] svirt-xen-hvm tests are incomplete with "DIE The console isn't responding correctly. Maybe half-open socket?" added
Updated by michalnowak almost 7 years ago
@mgriessmeier: You can fine-tune the polling mechanism to suite that particular host: https://github.com/os-autoinst/os-autoinst/pull/906/files#diff-333bbcc7c9ce8c440b7c87218c426f42R15.
Updated by dasantiago almost 7 years ago
Sometimes the channels die, that's why we implemented a quick failure mechanism, otherwise it would get stuck for 2 hours.
This requires an investigation at the machine/qemu level to determine why the channels fail.
As workaround please follow Michal's advise and try to increase the value, let's say for two minutes, just to be sure that it isn't the channel's dead or if it's the machine slow.
Updated by mgriessmeier almost 7 years ago
- Description updated (diff)
- Status changed from New to Workable
Updated by mgriessmeier almost 7 years ago
- Is duplicate of action #31534: [sle][functional][medium][s390x] test fails in install_and_reboot - vm stucks after installation process added
Updated by mgriessmeier almost 7 years ago
- Status changed from Workable to Rejected
duplicated by https://progress.opensuse.org/issues/31543
Updated by mgriessmeier almost 7 years ago
- Status changed from Rejected to In Progress
- Assignee set to mgriessmeier
I've introduced a circular dependency here...
reopening - trying out michals workaround suggestion
Updated by okurz almost 7 years ago
Latest example: https://openqa.suse.de/tests/1513907
You did not try out the workaround suggestion, did you?
Updated by mgriessmeier almost 7 years ago
- Status changed from In Progress to Feedback
okurz wrote:
Latest example: https://openqa.suse.de/tests/1513907
You did not try out the workaround suggestion, did you?
I've added _CHKSEL_RATE_WAIT_TIME=120
to the MACHINE s390x-kvm-sle12 now
setting too feedback to track it over the next week
please use this ticket if this issue occurs again
Updated by mgriessmeier almost 7 years ago
http://openqa.suse.de/tests/1521075/file/autoinst-log.txt
still happening with 120s :(
but it seems to occur less (just a feeling)
Updated by dasantiago almost 7 years ago
I see that there are 8 black/blank screens. Isn't this an indication that the channel is dead?
Updated by mgriessmeier almost 7 years ago
dasantiago wrote:
I see that there are 8 black/blank screens. Isn't this an indication that the channel is dead?
that's a 'not so nice' thing in the s390x implementation, we also have this on passing tests, e.g. https://openqa.suse.de/tests/1521080 for some reasons
but the half-open-socket issues also appears in cases like this https://openqa.suse.de/tests/1521150# where we don't see any black screens
I wonder if it could help to increase the value of this variable even more?
Updated by coolo almost 7 years ago
If for 2 minutes there are no activities on this socket, increasing it even more will hide some other problem even more.
Updated by michalnowak almost 7 years ago
Looking at https://openqa.suse.de/tests/1521150 I noticed that we start serial console grab console('svirt')->start_serial_grab
in power_action()
when the VM is, I suppose, down and then in redefine_svirt_domain.pm we are starting it again via $svirt->define_and_start
. The latter place seems to be the right one to connect to serial console.
For Xen I moved the logic you have in redefine_svirt_domain.pm to utils.pm's assert_shutdown_and_restore_system()
called from power_action()
.
Updated by mgriessmeier almost 7 years ago
- Has duplicate action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI added
Updated by mgriessmeier almost 7 years ago
- Subject changed from [sles][functional][tools][s390x][hard][sporadic] test incompletes - "DIE The console isn't responding correctly. Maybe half-open socket?" to [sles][functional][tools][s390x][ipmi][hard][sporadic] test incompletes - "DIE The console isn't responding correctly. Maybe half-open socket?"
Thanks Michal, I'll followup on this
also happens on ipmi, see https://openqa.suse.de/tests/1514142 and https://openqa.suse.de/tests/1516150
Updated by mgriessmeier almost 7 years ago
- Has duplicate deleted (action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI)
Updated by mgriessmeier almost 7 years ago
- Is duplicate of action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI added
Updated by mgriessmeier almost 7 years ago
- Status changed from Feedback to Rejected
Updated by xlai almost 7 years ago
The virtualization tests which rely on ipmi and ssh console, fail also with msg "
The console isn't responding correctly. Maybe half-open socket? at /usr/lib/os-autoinst/backend/baseclass.pm line 241", as reported in progress ticket #32746.
I am not sure whether the svirt console failure root cause is the same as the reported ticket #32746. But it has been marked as duplicated. So please help to double confirm the failures on ipmi virtualization tests do not happen again when pushing solution. Thanks!
Updated by mgriessmeier almost 7 years ago
- Status changed from Rejected to In Progress
Updated by mgriessmeier almost 7 years ago
- Status changed from In Progress to Workable
setting back to workable
will revisit on monday
Updated by xlai almost 7 years ago
mgriessmeier wrote:
setting back to workable
will revisit on monday
Would you please share the PR link with the fixes?
Updated by mgriessmeier almost 7 years ago
xlai wrote:
mgriessmeier wrote:
setting back to workable
will revisit on mondayWould you please share the PR link with the fixes?
as soon as I have one, sure
Updated by mgriessmeier almost 7 years ago
- Is duplicate of action #33001: [functional][sle][s390x] test fails in reboot_after_installation - Multiple tests failing to reconnect added
Updated by mgriessmeier almost 7 years ago
- Status changed from Workable to In Progress
with this one, I got 10 tests in a row working right now:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4585
will conduct more runs to get more statistics
Updated by mgriessmeier almost 7 years ago
- Status changed from In Progress to Resolved
PR was merged - includes the fix for s390x
for ipmi, nothing was done yet, better handle this in a separate ticket from now on -> reopening https://progress.opensuse.org/issues/32746
please reopen if issue occurs again on s390x
Updated by mgriessmeier almost 7 years ago
- Is duplicate of deleted (action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI)
Updated by mgriessmeier over 6 years ago
- Related to action #37087: [kernel][s390x] test incompletes in shutdown_ltp: half-open socket? added
Updated by okurz over 6 years ago
- Related to action #40655: [tools][ipmi] DIE The console isn't responding correctly. Maybe half-open socket? at /usr/lib/os-autoinst/backend/baseclass.pm line 241 added