Project

General

Profile

Actions

coordination #23650

closed

[sle][functional][ipmi][epic][u] Fix test suite gnome to work on ipmi 12-SP3 and 15 (WAS: test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?)

Added by okurz over 6 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 30
Start date:
2017-10-20
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

Observation

openQA test in scenario sle-15-Leanos-DVD-x86_64-gnome@64bit-ipmi fails in
boot_from_pxe as incomplete with the message "Error connecting to host : IO::Socket::INET: connect: Connection refused" in the autoinst-log.txt.

Expected result

The test never worked for SLE15.

Acceptance criteria

  • AC1: Test suite default is able to complete as install only for SLE12-SP5+ and SLE15-SP1+ over IPMI
  • AC2: Test suite gnome is able to complete as install only for SLE12-SP5+ and SLE15-SP1+ over IPMI

Problem

autoinst-log.txt

10:08:07.4045 Debug: /var/lib/openqa/cache/tests/sle/tests/boot/boot_from_pxe.pm:100 called testapi::select_console
10:08:07.4046 5804 <<< testapi::select_console(testapi_console='installation')
/usr/lib/os-autoinst/consoles/vnc_base.pm:64:{
  'password' => 'nots3cr3t',
  'port' => 5901,
  'hostname' => '10.162.2.87'
}
10:08:09.4100 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:10.4115 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:11.4129 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:12.4139 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:13.4150 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:14.4162 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:15.4174 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:16.4186 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
DIE socket does not exist. Probably your backend instance could not start or died. at /usr/lib/os-autoinst/consoles/VNC.pm line 881.

 at /usr/lib/os-autoinst/backend/baseclass.pm line 80.
    backend::baseclass::die_handler('socket does not exist. Probably your backend instance could n...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 801
    consoles::VNC::catch {...} ('socket does not exist. Probably your backend instance could n...') called at /usr/lib/perl5/vendor_perl/5.18.2/Try/Tiny.pm line 115
    Try::Tiny::try('CODE(0x843f028)', 'Try::Tiny::Catch=REF(0x843f310)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 803
    consoles::VNC::update_framebuffer('consoles::VNC=HASH(0x8440268)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 74

That's annoying because incompletes are harder to understand and carry over can't work. (Improvement of this message is handled separately)

Further details

Always latest result in this scenario:


Subtasks 7 (0 open7 closed)

action #26926: [sle][functional]ipmi VNC reconnect failures cause jobs to end up incomplete -> turn into failResolvedszarate2017-10-20

Actions
action #26928: [sle][functional]gnome@64bit-ipmi using VNC installation can not reach VNC server anymore -> regression on a29497af?Resolvedokurz2017-10-20

Actions
action #26948: [sle][functional][ipmi][hard] Adjust boot_from_pxe to sanely handle multiple network interfacesResolvedSLindoMansilla2017-10-23

Actions
action #32089: [sle][functional][u][ipmi][easy] test fails in first_boot - abort the test early so that we at least test the installationResolvedSLindoMansilla2018-02-05

Actions
action #37387: [sle][functional][ipmi][u] Fix test suite gnome to work on ipmi SLE 12 and 15Rejectedokurz2018-06-14

Actions
action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at allResolvedxlai2017-10-20

Actions
action #41693: [sle][functional][u][ipmi][sporadic] test fails in boot_from_pxe - needs to increase ssh_vnc_wait_timeRejectedSLindoMansilla2018-09-27

Actions

Related issues 4 (1 open3 closed)

Related to openQA Tests - action #20022: [sle][functional][zkvm][s390] incomplete test due to socket does not exist. Probably your backend instance could not start or diedResolvedmgriessmeier2017-06-232017-10-25

Actions
Related to openQA Tests - action #60500: [sle][functional][u] - default@64bit-ipmi scheduled for SLES 15 SP2RejectedSLindoMansilla2019-12-02

Actions
Blocked by openQA Tests - action #19350: [sle][functional][s390x][zkvm][hard] make unavailable ssh based zkvm consoles more obvious in the backend (was: [consistent] unable to switch to text terminal in consoletest_setup -> bsc#1040606)Resolvedmgriessmeier2017-05-242018-01-17

Actions
Blocks openQA Tests - action #41207: [qe-core][functional][ipmi] test fails in reboot_gnome - seems we call some code which we are not allowed to do, need to "reset_consoles" or something? nearly there to a complete run again :)Workable2018-09-18

Actions
Actions #2

Updated by okurz over 6 years ago

  • Blocked by action #19350: [sle][functional][s390x][zkvm][hard] make unavailable ssh based zkvm consoles more obvious in the backend (was: [consistent] unable to switch to text terminal in consoletest_setup -> bsc#1040606) added
Actions #3

Updated by okurz over 6 years ago

  • Assignee set to nicksinger

as discussed during standup 2017-09-20

Actions #4

Updated by okurz over 6 years ago

  • Target version set to Milestone 11
Actions #6

Updated by okurz over 6 years ago

  • Due date set to 2017-10-11
Actions #7

Updated by okurz over 6 years ago

The latest job does not incomplete but fail without a failed module stated. To my understanding the VNC stall detection is just a symptom, not the problem. Could be that the VNC process or a ssh terminal process died in the background and isotovideo does not check that so I suggest one of the following:

  1. (preferred) the VNC process terminating should not go unnoticed
  2. catch the "die" and handle it gracefully after the connection is terminated
Actions #8

Updated by mgriessmeier over 6 years ago

  • Status changed from New to In Progress

Work in Progress PR created, unfortunately not as far progressed as we wanted to have it due to more important issues popping up
@okurz, nsinger: hopefully you can take this as a base to continue further in this sprint.

we turned the die into an Exception, but failed to add a record_info box - though we found a nice way to reproduce the "Socket does not exist" issue consistently (nsinger knows more about that)

https://github.com/os-autoinst/os-autoinst/pull/862

Actions #9

Updated by mgriessmeier over 6 years ago

  • Related to action #20022: [sle][functional][zkvm][s390] incomplete test due to socket does not exist. Probably your backend instance could not start or died added
Actions #10

Updated by riafarov over 6 years ago

PR with fix of review comment to be able to merge it: https://github.com/os-autoinst/os-autoinst/pull/864

Actions #11

Updated by okurz over 6 years ago

did not complete in sprint 1. main reason: spontaneous packaging training which we were not aware of in before. we have the PR which should improve user feedback a lot and this is definitely possible in the next sprint 2.

Actions #12

Updated by okurz over 6 years ago

  • Due date changed from 2017-10-11 to 2017-10-25
Actions #13

Updated by okurz over 6 years ago

Actions #14

Updated by nicksinger over 6 years ago

@okurz helped a hell lot to form a hypothesis together with me what happens here; our current IPMI code in the test (boot/boot_from_pxe.pm) checks for a running SSHD on the SUT and continues to execute. The next step activates a console named "install" what basically means for the worker: "open a connection (whatever protocol) to the SUT and expect the yast installer there". Right now the tests expects a running VNC implicit by checking for a running SSHD. This worked for a long time but obviously does not apply anymore so the test tries to connect to early and receives a "Connection timed out".

The current approach now is to adjust the needle which checks for a running SSH/VNC dynamically based on the variable "VIDEOMODE": http://openqa.glados.qa.suse.de/tests/484#step/boot_from_pxe/24

Actions #15

Updated by okurz over 6 years ago

screenshot does not look like there is a responsive VNC server. I suggest to crosscheck manually.

Actions #16

Updated by nicksinger over 6 years ago

  • Description updated (diff)
Actions #17

Updated by nicksinger over 6 years ago

Manual investigation of @okurz and me revealed the kernel parameter/console redirection as part of the cause why this fails. Removing the additional "console=tty" and just let "console=ttyS1,115200" in there results in the expected output on the serial console.

Actions #18

Updated by okurz over 6 years ago

as discussed, please make sure the ticket is closed today

Actions #19

Updated by okurz over 6 years ago

  • Due date set to 2017-10-25

due to changes in a related task

Actions #20

Updated by okurz over 6 years ago

  • Subject changed from [sle][functional][ipmi]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?
  • Status changed from In Progress to Feedback

I think the comment you wanted to add is "https://github.com/nicksinger/os-autoinst-distri-opensuse/commit/c7d775f042e346d00cd084bc9bf7b4df30ba7768 provides a first fix for this. Test can now continue and finds the right needle: http://openqa.glados.qa.suse.de/tests/517 . The test can still not succeed since the VPNd binds to the second interface and is therefore not reachable on the expected address." from #26038#note-14

So we failed to close this ticket today … but at least I tried to improve by creating subtickets

Actions #21

Updated by nicksinger over 6 years ago

  • Status changed from Feedback to Resolved

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3770 addresses the major problem we want to address here (test checks for SSH running while we really want to connect to VNC). Anyway, it will still fail and there is more improvement needed which I'll track in another ticket.

Actions #22

Updated by okurz over 6 years ago

  • Status changed from Resolved to In Progress
Actions #23

Updated by nicksinger over 6 years ago

  • Status changed from In Progress to Feedback
Actions #24

Updated by okurz over 6 years ago

  • Due date changed from 2017-10-25 to 2017-11-08

due to changes in a related task

Actions #25

Updated by okurz over 6 years ago

  • Target version changed from Milestone 11 to Milestone 12
Actions #26

Updated by okurz over 6 years ago

  • Due date changed from 2017-11-08 to 2018-01-16

due to changes in a related task

Actions #27

Updated by nicksinger over 6 years ago

  • Assignee deleted (nicksinger)
Actions #28

Updated by okurz over 6 years ago

  • Due date changed from 2018-01-16 to 2018-02-27

due to changes in a related task

Actions #29

Updated by okurz about 6 years ago

  • Target version changed from Milestone 12 to Milestone 14
Actions #30

Updated by okurz about 6 years ago

  • Assignee set to SLindoMansilla
Actions #31

Updated by SLindoMansilla about 6 years ago

  • Status changed from Feedback to In Progress

Working on sub-task

Actions #32

Updated by riafarov about 6 years ago

  • Due date changed from 2018-02-27 to 2018-03-13

due to changes in a related task

Actions #33

Updated by SLindoMansilla about 6 years ago

  • Status changed from In Progress to Resolved

All sub task resolved, all related tasks resolved.

Actions #34

Updated by okurz about 6 years ago

  • Status changed from Resolved to Workable

But expected result is not met: See https://openqa.suse.de/tests/1514178 failing in first_boot

Actions #35

Updated by SLindoMansilla about 6 years ago

Trying with DESKTOP=textmode: http://copland.arch.suse.de/tests/942

Actions #37

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-03-13 to 2018-03-27

due to changes in a related task

Actions #38

Updated by mgriessmeier about 6 years ago

  • Target version changed from Milestone 14 to Milestone 15
Actions #39

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-03-27 to 2018-04-10

due to changes in a related task

Actions #40

Updated by okurz about 6 years ago

  • Subject changed from [sle][functional][ipmi][epic]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic][u]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?
Actions #41

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-04-10 to 2018-04-24

due to changes in a related task

Actions #42

Updated by SLindoMansilla about 6 years ago

  • Subject changed from [sle][functional][ipmi][epic][u]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic][u] Fix test suite gnome to work on ipmi 12-SP3 and 15 (WAS: test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?)
Actions #43

Updated by SLindoMansilla about 6 years ago

  • Related to action #31375: [sle][functional][ipmi][u][hard] test fails in first_boot - VNC installation on SLE 15 failed because of various issues (ipmi worker, first_boot, boot_from_pxe, await_install) added
Actions #44

Updated by mgriessmeier almost 6 years ago

  • Due date changed from 2018-04-24 to 2018-05-08

due to changes in a related task

Actions #45

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 15 to Milestone 16

correcting milestone

Actions #46

Updated by okurz almost 6 years ago

  • Related to action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all added
Actions #47

Updated by okurz almost 6 years ago

  • Description updated (diff)
  • Target version changed from Milestone 16 to Milestone 17

https://openqa.suse.de/tests/1749830 is the latest job, not exactly "working" so we are not done here.

Actions #48

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 17 to Milestone 21+
Actions #49

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 21+ to Milestone 21+
Actions #50

Updated by okurz over 5 years ago

  • Status changed from Workable to Blocked
Actions #51

Updated by SLindoMansilla over 5 years ago

  • Related to deleted (action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all)
Actions #52

Updated by SLindoMansilla over 5 years ago

  • Blocked by action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all added
Actions #53

Updated by SLindoMansilla over 5 years ago

  • Blocked by deleted (action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all)
Actions #54

Updated by SLindoMansilla over 5 years ago

Sorry, I confused the subject line with the PXE. Added as subtask: #36027

Actions #55

Updated by SLindoMansilla over 5 years ago

  • Related to deleted (action #31375: [sle][functional][ipmi][u][hard] test fails in first_boot - VNC installation on SLE 15 failed because of various issues (ipmi worker, first_boot, boot_from_pxe, await_install))
Actions #56

Updated by SLindoMansilla over 5 years ago

Maybe related, this job restarts IPMI machines: http://jenkins.qa.suse.de/job/restart-ipmi-mainboard/

Actions #57

Updated by okurz over 5 years ago

  • Blocks action #41207: [qe-core][functional][ipmi] test fails in reboot_gnome - seems we call some code which we are not allowed to do, need to "reset_consoles" or something? nearly there to a complete run again :) added
Actions #58

Updated by okurz over 5 years ago

  • Target version changed from Milestone 21+ to Milestone 24
Actions #59

Updated by mgriessmeier almost 5 years ago

  • Target version changed from Milestone 24 to Milestone 25

Idk what's the state here - can someone explain?

Actions #60

Updated by SLindoMansilla almost 5 years ago

  • Description updated (diff)
  • Status changed from Blocked to Workable

Blocker resolved: #19350

Actions #61

Updated by SLindoMansilla almost 5 years ago

  • Description updated (diff)
  • Assignee deleted (SLindoMansilla)
Actions #62

Updated by SLindoMansilla almost 5 years ago

  • Description updated (diff)
Actions #63

Updated by zluo almost 5 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over. this is 2 years old ticket!

Actions #64

Updated by zluo almost 5 years ago

https://openqa.suse.de/tests/2931505 shows boot_from_pxe works fine.

re-trigger it on osd because hostname_inst got wrongly scheduled for ipmi:
https://openqa.suse.de/tests/2956663#settings

Actions #65

Updated by zluo almost 5 years ago

  • Status changed from In Progress to Rejected

grub_test failed, but this is another issue. So I don't see any problem for gnome test on ipmi.

set as rejected for now

Actions #66

Updated by okurz almost 5 years ago

  • Status changed from Rejected to In Progress

It is true that "boot_from_pxe" is now more stable. However the ACs are not fulfilled, please see the description for that. It mentions four scenarios. Currently https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&groupid=129&groupid=110&groupid=132&build=228.2&arch=x86_64 shows only btrfs@ipmi so default@ipmi, gnome@ipmi and the corresponding two for SLE15 are missing.

Actions #67

Updated by zluo almost 5 years ago

  • Target version changed from Milestone 25 to Milestone 26

Sergio has added gnome@64bit-ipmi now, need to check the test results for next build.

Actions #68

Updated by SLindoMansilla almost 5 years ago

We still didn't have any build since I schedule the gnome@ipmi job.
If necessary we could perform an JOB POST typing the settings manually.

Actions #70

Updated by SLindoMansilla almost 5 years ago

  • Description updated (diff)
Actions #71

Updated by zluo almost 5 years ago

@sergio this is working for boot_from_pxe, can you put the job then into Job groups "SLES 12 functional"? Thanks!

Actions #72

Updated by okurz almost 5 years ago

the test still fails in first_boot. Nothing changed because no one changed code: https://openqa.suse.de/tests/3038117#step/first_boot/7 so as long as this doesn't work you should not bring it into the validation job group.

Actions #73

Updated by zluo almost 5 years ago

I thought boot_from_pxe was not working. it works now, firs_boot failed, I think this is another issue.

Actions #74

Updated by zluo almost 5 years ago

to check https://openqa.suse.de/tests/3043347 (without reconnect_mgmt_console, grub_test)

to check https://openqa.suse.de/tests/3043448 (without grub_test) as well

Actions #76

Updated by zluo almost 5 years ago

  • Status changed from In Progress to Blocked

https://openqa.suse.de/tests/3043347 shows that first_boot works fine if grub_test is not started before.

So we need to handle the issue reported: #53249

Actions #77

Updated by zluo almost 5 years ago

  • Blocked by coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system added
Actions #78

Updated by zluo almost 5 years ago

  • Target version changed from Milestone 26 to Milestone 30+
Actions #79

Updated by zluo almost 5 years ago

  • Status changed from Blocked to Workable
Actions #80

Updated by zluo almost 5 years ago

  • Status changed from Workable to Blocked
Actions #81

Updated by SLindoMansilla almost 5 years ago

  • Due date changed from 2018-12-31 to 2018-09-27

due to changes in a related task

Actions #82

Updated by SLindoMansilla almost 5 years ago

  • Due date changed from 2018-05-08 to 2018-09-27

due to changes in a related task

Actions #83

Updated by SLindoMansilla almost 5 years ago

  • Due date changed from 2018-03-13 to 2018-09-27

due to changes in a related task

Actions #84

Updated by SLindoMansilla almost 5 years ago

  • Due date changed from 2017-11-08 to 2018-09-27

due to changes in a related task

Actions #85

Updated by SLindoMansilla almost 5 years ago

  • Due date changed from 2017-11-08 to 2018-09-27

due to changes in a related task

Actions #86

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3247183

Actions #87

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3329449

Actions #88

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3381530

Actions #89

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3598216

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #90

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3649158

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #91

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3700456

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #92

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3727073

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #93

Updated by mgriessmeier over 4 years ago

  • Target version changed from Milestone 30+ to Milestone 30

needs to be discussed offline

Actions #94

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3770256

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #95

Updated by szarate almost 4 years ago

  • Assignee changed from zluo to SLindoMansilla

I guess this ticket can be rejected

Actions #96

Updated by SLindoMansilla almost 4 years ago

  • Status changed from Blocked to In Progress

True, #60500 should be enough.

Actions #97

Updated by SLindoMansilla almost 4 years ago

  • Related to action #60500: [sle][functional][u] - default@64bit-ipmi scheduled for SLES 15 SP2 added
Actions #98

Updated by SLindoMansilla almost 4 years ago

  • Blocked by deleted (coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system)
Actions #99

Updated by SLindoMansilla almost 4 years ago

  • Status changed from In Progress to Resolved
Actions #100

Updated by szarate over 3 years ago

  • Tracker changed from action to coordination
Actions

Also available in: Atom PDF