action #31543
closed
[sles][functional][tools][s390x][ipmi][hard][sporadic] test incompletes - "DIE The console isn't responding correctly. Maybe half-open socket?"
Added by mgriessmeier over 6 years ago.
Updated over 6 years ago.
Category:
Feature requests
Description
Observation¶
openQA test in scenario sle-15-Installer-DVD-s390x-allpatterns@s390x-kvm-sle12 fails in
install_and_reboot
- all tests failing are s390x-kvm-sle12 (s390p8 LPAR), though not all jobs on this LPAR are failing
- all those jobs are running on openqaw2
- there were no recent changes in os-autoinst or the tests regarding this (dasantiago confirmed in irc)
(maybe) related PRs:
https://github.com/os-autoinst/os-autoinst/pull/906
https://github.com/os-autoinst/os-autoinst/pull/902
Tasks¶
- Gather statistics about how often this happens
- Check if we can handle the root cause of these "debug-messages"
Acceptance Criteria¶
one of those should be done:
- AC1: Turn the incomplete into a fail with a proper message, understandable by everyone
- AC2: Find the root-cause of this and come up with a fix
Workaround¶
Reproducible¶
Fails since (at least) Build 408.1
Expected result¶
Last good: (unknown) (or more recent)
Further details¶
Always latest result in this scenario: latest
- Related to action #30216: [sles][virtualization][xen] svirt-xen-hvm tests are incomplete with "DIE The console isn't responding correctly. Maybe half-open socket?" added
Sometimes the channels die, that's why we implemented a quick failure mechanism, otherwise it would get stuck for 2 hours.
This requires an investigation at the machine/qemu level to determine why the channels fail.
As workaround please follow Michal's advise and try to increase the value, let's say for two minutes, just to be sure that it isn't the channel's dead or if it's the machine slow.
- Due date set to 2018-03-13
- Description updated (diff)
- Status changed from New to Workable
- Is duplicate of action #31534: [sle][functional][medium][s390x] test fails in install_and_reboot - vm stucks after installation process added
- Status changed from Workable to Rejected
- Status changed from Rejected to In Progress
- Assignee set to mgriessmeier
I've introduced a circular dependency here...
reopening - trying out michals workaround suggestion
- Status changed from In Progress to Feedback
okurz wrote:
Latest example: https://openqa.suse.de/tests/1513907
You did not try out the workaround suggestion, did you?
I've added _CHKSEL_RATE_WAIT_TIME=120
to the MACHINE s390x-kvm-sle12 now
setting too feedback to track it over the next week
please use this ticket if this issue occurs again
I see that there are 8 black/blank screens. Isn't this an indication that the channel is dead?
dasantiago wrote:
I see that there are 8 black/blank screens. Isn't this an indication that the channel is dead?
that's a 'not so nice' thing in the s390x implementation, we also have this on passing tests, e.g. https://openqa.suse.de/tests/1521080 for some reasons
but the half-open-socket issues also appears in cases like this https://openqa.suse.de/tests/1521150# where we don't see any black screens
I wonder if it could help to increase the value of this variable even more?
If for 2 minutes there are no activities on this socket, increasing it even more will hide some other problem even more.
Looking at https://openqa.suse.de/tests/1521150 I noticed that we start serial console grab console('svirt')->start_serial_grab
in power_action()
when the VM is, I suppose, down and then in redefine_svirt_domain.pm we are starting it again via $svirt->define_and_start
. The latter place seems to be the right one to connect to serial console.
For Xen I moved the logic you have in redefine_svirt_domain.pm to utils.pm's assert_shutdown_and_restore_system()
called from power_action()
.
- Has duplicate action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI added
- Subject changed from [sles][functional][tools][s390x][hard][sporadic] test incompletes - "DIE The console isn't responding correctly. Maybe half-open socket?" to [sles][functional][tools][s390x][ipmi][hard][sporadic] test incompletes - "DIE The console isn't responding correctly. Maybe half-open socket?"
- Has duplicate deleted (action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI)
- Is duplicate of action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI added
- Status changed from Feedback to Rejected
The virtualization tests which rely on ipmi and ssh console, fail also with msg "
The console isn't responding correctly. Maybe half-open socket? at /usr/lib/os-autoinst/backend/baseclass.pm line 241", as reported in progress ticket #32746.
I am not sure whether the svirt console failure root cause is the same as the reported ticket #32746. But it has been marked as duplicated. So please help to double confirm the failures on ipmi virtualization tests do not happen again when pushing solution. Thanks!
- Status changed from Rejected to In Progress
- Status changed from In Progress to Workable
setting back to workable
will revisit on monday
mgriessmeier wrote:
setting back to workable
will revisit on monday
Would you please share the PR link with the fixes?
xlai wrote:
mgriessmeier wrote:
setting back to workable
will revisit on monday
Would you please share the PR link with the fixes?
as soon as I have one, sure
- Is duplicate of action #33001: [functional][sle][s390x] test fails in reboot_after_installation - Multiple tests failing to reconnect added
- Status changed from Workable to In Progress
- Status changed from In Progress to Resolved
PR was merged - includes the fix for s390x
for ipmi, nothing was done yet, better handle this in a separate ticket from now on -> reopening https://progress.opensuse.org/issues/32746
please reopen if issue occurs again on s390x
- Is duplicate of deleted (action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI)
- Related to action #37087: [kernel][s390x] test incompletes in shutdown_ltp: half-open socket? added
- Related to action #40655: [tools][ipmi] DIE The console isn't responding correctly. Maybe half-open socket? at /usr/lib/os-autoinst/backend/baseclass.pm line 241 added
Also available in: Atom
PDF