action #46964

[functional][u][s390x] test fails in the middle of execution (not installation) as incomplete with "half-open socket?" – connection to machine vanished?

Added by okurz about 1 year ago. Updated 11 months ago.

Status:ResolvedStart date:01/02/2019
Priority:HighDue date:
Assignee:okurz% Done:

0%

Category:Bugs in existing tests
Target version:SUSE QA tests - Milestone 24
Difficulty:
Duration:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-s390x-Build158.4-minimal_x@s390x-kvm-sle12 fails in
shibboleth as incomplete with

[2019-02-01T09:39:37.034 CET] [debug] <<< testapi::wait_serial(regexp=qr/(?^:IBCUa-\d+-)/, timeout=90)

[2019-02-01T09:43:23.558 CET] [debug] Backend process died, backend errors are reported below in the following lines:
The console isn't responding correctly. Maybe half-open socket? at /usr/lib/os-autoinst/backend/baseclass.pm line 245.

[2019-02-01T09:43:23.559 CET] [debug] Destroying openQA-SUT-2 virtual machine
Use of uninitialized value $libvirt_connector in concatenation (.) or string at /usr/lib/os-autoinst/backend/svirt.pm line 80.
[2019-02-01T09:43:23.638 CET] [debug] Connection to root@s390p8.suse.de established
[2019-02-01T09:43:23.724 CET] [debug] Command's stdout:


[2019-02-01T09:43:23.724 CET] [debug] Command's stderr:
error: Failed to destroy domain openQA-SUT-2
error: Requested operation is not valid: domain is not running

[2019-02-01T09:43:23.725 CET] [debug] Command executed: virsh  destroy openQA-SUT-2, ret=1
Use of uninitialized value $libvirt_connector in concatenation (.) or string at /usr/lib/os-autoinst/backend/svirt.pm line 81.
[2019-02-01T09:43:23.798 CET] [debug] Connection to root@s390p8.suse.de established
[2019-02-01T09:43:23.905 CET] [debug] Command's stdout:
Domain openQA-SUT-2 has been undefined


[2019-02-01T09:43:23.905 CET] [debug] Command executed: virsh  undefine --snapshots-metadata openQA-SUT-2, ret=0
last frame
[2019-02-01T09:53:02.781 CET] [debug] sending magic and exit
[2019-02-01T09:53:02.781 CET] [debug] received magic close
[2019-02-01T09:53:02.782 CET] [debug] THERE IS NOTHING TO READ 15 4 3
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":43687"
      after 15743 requests (15743 known processed) with 0 events remaining.
[2019-02-01T09:53:02.804 CET] [debug] backend process exited: 0
[2019-02-01T09:53:02.804 CET] [debug] Driver backend collected unknown process with pid 175519 and exit status: 0
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":49304"
      after 15741 requests (15741 known processed) with 0 events remaining.
[2019-02-01T09:53:02.805 CET] [debug] Driver backend collected unknown process with pid 177394 and exit status: 1
[2019-02-01T09:53:02.805 CET] [debug] Driver backend collected unknown process with pid 177499 and exit status: 1
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":49304"
[2019-02-01T09:53:02.806 CET] [debug] commands process exited: 0
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":43687"
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":56710"
      after 15740 requests (15740 known processed) with 0 events remaining.
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":56710"
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":36800"
      after 15737 requests (15737 known processed) with 0 events remaining.
xterm: fatal IO error 11 (Resource temporarily unavailable) or KillClient on X server ":36800"
[2019-02-01T09:53:02.823 CET] [debug] Driver backend collected unknown process with pid 176625 and exit status: 1
[2019-02-01T09:53:02.823 CET] [debug] Driver backend collected unknown process with pid 177223 and exit status: 1
[2019-02-01T09:53:02.823 CET] [debug] Driver backend collected unknown process with pid 177375 and exit status: 0
[2019-02-01T09:53:02.823 CET] [debug] sysread failed: 
[2019-02-01T09:53:02.824 CET] [debug] >>> testapi::wait_serial: (?^:IBCUa-\d+-): fail
[2019-02-01T09:53:02.824 CET] [debug] command 'curl --no-buffer http://localhost/Shibboleth.sso/Status | grep 'Cannot connect to shibd process'' timed out at /var/lib/openqa/cache/openqa.suse.de/tests/sle/tests/console/shibboleth.pm line 26.

[2019-02-01T09:53:02.824 CET] [debug] <<< testapi::select_console(testapi_console='log-console')
[2019-02-01T09:53:02.825 CET] [debug] post_fail_hook failed: syswrite failed Broken pipe at /usr/lib/os-autoinst/myjsonrpc.pm line 40.
    myjsonrpc::send_json('GLOB(0x10016819308)', 'HASH(0x1001386a768)') called at /usr/lib/os-autoinst/autotest.pm line 313
    autotest::query_isotovideo('backend_select_console', 'HASH(0x10016fe44a0)') called at /usr/lib/os-autoinst/testapi.pm line 1459
    testapi::select_console('log-console') called at /var/lib/openqa/cache/openqa.suse.de/tests/sle/lib/consoletest.pm line 21
    consoletest::post_fail_hook('shibboleth=HASH(0x10016865310)') called at /usr/lib/os-autoinst/basetest.pm line 324
    eval {...} called at /usr/lib/os-autoinst/basetest.pm line 324
    basetest::run_post_fail('shibboleth=HASH(0x10016865310)', 'test shibboleth died') called at /usr/lib/os-autoinst/basetest.pm line 377
    basetest::runtest('shibboleth=HASH(0x10016865310)') called at /usr/lib/os-autoinst/autotest.pm line 357
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 357
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 214
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 214
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 267
    autotest::__ANON__('Mojo::IOLoop::ReadWriteProcess=HASH(0x100165ca3b0)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 325
    eval {...} called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 325
    Mojo::IOLoop::ReadWriteProcess::_fork('Mojo::IOLoop::ReadWriteProcess=HASH(0x100165ca3b0)', 'CODE(0x1001682dc58)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 476
    Mojo::IOLoop::ReadWriteProcess::start('Mojo::IOLoop::ReadWriteProcess=HASH(0x100165ca3b0)') called at /usr/lib/os-autoinst/autotest.pm line 268
    autotest::start_process() called at /usr/bin/isotovideo line 251

[2019-02-01T09:53:02.826 CET] [debug] syswrite failed Broken pipe at /usr/lib/os-autoinst/myjsonrpc.pm line 40.
    myjsonrpc::send_json('GLOB(0x10016819308)', 'HASH(0x10016fc3078)') called at /usr/lib/os-autoinst/autotest.pm line 313
    autotest::query_isotovideo('set_current_test', 'HASH(0x10016fedc38)') called at /usr/lib/os-autoinst/autotest.pm line 174
    autotest::set_current_test(undef) called at /usr/lib/os-autoinst/basetest.pm line 288
    basetest::fail_if_running('shibboleth=HASH(0x10016865310)') called at /usr/lib/os-autoinst/basetest.pm line 327
    basetest::run_post_fail('shibboleth=HASH(0x10016865310)', 'test shibboleth died') called at /usr/lib/os-autoinst/basetest.pm line 377
    basetest::runtest('shibboleth=HASH(0x10016865310)') called at /usr/lib/os-autoinst/autotest.pm line 357
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 357
    autotest::runalltests() called at /usr/lib/os-autoinst/autotest.pm line 214
    eval {...} called at /usr/lib/os-autoinst/autotest.pm line 214
    autotest::run_all() called at /usr/lib/os-autoinst/autotest.pm line 267
    autotest::__ANON__('Mojo::IOLoop::ReadWriteProcess=HASH(0x100165ca3b0)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 325
    eval {...} called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 325
    Mojo::IOLoop::ReadWriteProcess::_fork('Mojo::IOLoop::ReadWriteProcess=HASH(0x100165ca3b0)', 'CODE(0x1001682dc58)') called at /usr/lib/perl5/vendor_perl/5.18.2/Mojo/IOLoop/ReadWriteProcess.pm line 476
    Mojo::IOLoop::ReadWriteProcess::start('Mojo::IOLoop::ReadWriteProcess=HASH(0x100165ca3b0)') called at /usr/lib/os-autoinst/autotest.pm line 268
    autotest::start_process() called at /usr/bin/isotovideo line 251

[2019-02-01T09:53:02.827 CET] [debug] Driver backend collected unknown process with pid 177202 and exit status: 0
[2019-02-01T09:53:02.831 CET] [debug] [autotest] process exited: 0
[2019-02-01T09:53:02.831 CET] [debug] Driver backend collected unknown process with pid 176606 and exit status: 0
[2019-02-01T09:53:02.835 CET] [debug] Driver backend collected unknown process with pid 177478 and exit status: 0
[2019-02-01T09:53:02.835 CET] [debug] isotovideo failed
[2019-02-01T09:53:02.836 CET] [debug] killing backend process 175515
[2019-02-01T09:53:02.836 CET] [debug] done with backend process
175500: EXIT 1

Reproducible

First occurence?

Expected result

Last good: 157.1

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #46919: [functional][u][svirt][sporadic] auto_review:"IO::Socket:... New 01/12/2018
Related to openQA Tests - action #32746: [sle][tools][remote-backends][hard] Incomplete job becaus... Resolved 05/03/2018
Related to openQA Tests - action #46394: [sle][functional][u][s390x][spvm][kvm][sporadic] test fai... Workable
Related to openQA Tests - action #48260: [sle][functional][u][s390x][kvm] test fails in reboot_aft... Resolved 21/02/2019
Duplicated by openQA Tests - action #48833: [sle][functional][u] test fails in hostname_inst - cannot... Rejected 07/03/2019

History

#1 Updated by okurz about 1 year ago

  • Related to action #46919: [functional][u][svirt][sporadic] auto_review:"IO::Socket::INET: connect: Connection timed out" added

#2 Updated by szarate about 1 year ago

Grenache seems to be suffering quite a bit: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=163.1&groupid=110&arch=s390x&result=incomplete - plus few that are still scheduled (I did rettriger some)

#3 Updated by okurz about 1 year ago

  • Related to action #32746: [sle][tools][remote-backends][hard] Incomplete job because console isn't responding correctly. Half-open socket on IPMI added

#4 Updated by okurz about 1 year ago

  • Related to action #46394: [sle][functional][u][s390x][spvm][kvm][sporadic] test fails in various modules to login to svirt console (or system is not up yet) added

#5 Updated by okurz about 1 year ago

  • Related to action #48260: [sle][functional][u][s390x][kvm] test fails in reboot_after_installation - "The console isn't responding correctly. Maybe half-open socket?" - we all hate it ;) added

#6 Updated by riafarov about 1 year ago

It affects a lot of jobs lately, I would suggest to raise the priority. And also help ;)

#7 Updated by okurz about 1 year ago

  • Subject changed from [functional][u][s390x] test fails as incomplete with "half-open socket?" – connection to machine vanished? to [functional][u][s390x] test fails in the middle of execution (not installation) as incomplete with "half-open socket?" – connection to machine vanished?
  • Priority changed from Normal to High
  • Target version changed from future to Milestone 24

We can raise the prio but also I created another ticket #48260 about the incompletes in installation. Because this one here originally was about a connection lost somewhere in the middle. I updated the subject line to make it clear that this ticket is not about installation. The cases you observed are probably more fitting for #48260 I assume?

#8 Updated by okurz 12 months ago

  • Status changed from New to Feedback
  • Assignee set to nicksinger

#48260 is resolved together with a change to os-autoinst that should improve the feedback with a hint what happened to which console.

@nsinger as osd deployments are executed based on a rather cautious approach could you please track this ticket until gh#os-autoinst/os-autoinst#1120 is deployed to osd workers?

#9 Updated by okurz 12 months ago

  • Duplicated by action #48833: [sle][functional][u] test fails in hostname_inst - cannot reach install-shell, Maybe half-open socket? added

#10 Updated by okurz 11 months ago

  • Status changed from Feedback to Resolved
  • Assignee changed from nicksinger to okurz

@nicksinger seems you have overlooked this ticket? https://confluence.suse.de/pages/viewpage.action?pageId=194052156 shows the changed deployed.

https://openqa.suse.de/tests/2549409 and last 10 jobs in this scenario look pretty stable, no more incompletes.

Also available in: Atom PDF