Project

General

Profile

action #23650

[sle][functional][ipmi][epic][u] Fix test suite gnome to work on ipmi 12-SP3 and 15 (WAS: test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?)

Added by okurz about 3 years ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Category:
Bugs in existing tests
Target version:
SUSE QA tests - Milestone 30
Start date:
2017-10-20
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

Observation

openQA test in scenario sle-15-Leanos-DVD-x86_64-gnome@64bit-ipmi fails in
boot_from_pxe as incomplete with the message "Error connecting to host : IO::Socket::INET: connect: Connection refused" in the autoinst-log.txt.

Expected result

The test never worked for SLE15.

Acceptance criteria

  • AC1: Test suite default is able to complete as install only for SLE12-SP5+ and SLE15-SP1+ over IPMI
  • AC2: Test suite gnome is able to complete as install only for SLE12-SP5+ and SLE15-SP1+ over IPMI

Problem

autoinst-log.txt

10:08:07.4045 Debug: /var/lib/openqa/cache/tests/sle/tests/boot/boot_from_pxe.pm:100 called testapi::select_console
10:08:07.4046 5804 <<< testapi::select_console(testapi_console='installation')
/usr/lib/os-autoinst/consoles/vnc_base.pm:64:{
  'password' => 'nots3cr3t',
  'port' => 5901,
  'hostname' => '10.162.2.87'
}
10:08:09.4100 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:10.4115 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:11.4129 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:12.4139 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:13.4150 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:14.4162 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:15.4174 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:16.4186 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
DIE socket does not exist. Probably your backend instance could not start or died. at /usr/lib/os-autoinst/consoles/VNC.pm line 881.

 at /usr/lib/os-autoinst/backend/baseclass.pm line 80.
    backend::baseclass::die_handler('socket does not exist. Probably your backend instance could n...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 801
    consoles::VNC::catch {...} ('socket does not exist. Probably your backend instance could n...') called at /usr/lib/perl5/vendor_perl/5.18.2/Try/Tiny.pm line 115
    Try::Tiny::try('CODE(0x843f028)', 'Try::Tiny::Catch=REF(0x843f310)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 803
    consoles::VNC::update_framebuffer('consoles::VNC=HASH(0x8440268)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 74

That's annoying because incompletes are harder to understand and carry over can't work. (Improvement of this message is handled separately)

Further details

Always latest result in this scenario:


Subtasks

action #26926: [sle][functional]ipmi VNC reconnect failures cause jobs to end up incomplete -> turn into failResolvedszarate

action #26928: [sle][functional]gnome@64bit-ipmi using VNC installation can not reach VNC server anymore -> regression on a29497af?Resolvedokurz

action #26948: [sle][functional][ipmi][hard] Adjust boot_from_pxe to sanely handle multiple network interfacesResolvedSLindoMansilla

action #32089: [sle][functional][u][ipmi][easy] test fails in first_boot - abort the test early so that we at least test the installationResolvedSLindoMansilla

action #37387: [sle][functional][ipmi][u] Fix test suite gnome to work on ipmi SLE 12 and 15Rejectedokurz

action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at allResolvedxlai

action #41693: [sle][functional][u][ipmi][sporadic] test fails in boot_from_pxe - needs to increase ssh_vnc_wait_timeRejectedSLindoMansilla


Related issues

Related to openQA Tests - action #20022: [sle][functional][zkvm][s390] incomplete test due to socket does not exist. Probably your backend instance could not start or diedResolved2017-06-232017-10-25

Related to openQA Tests - action #60500: [sle][functional][u] - default@64bit-ipmi scheduled for SLES 15 SP2Rejected2019-12-02

Blocked by openQA Tests - action #19350: [sle][functional][s390x][zkvm][hard] make unavailable ssh based zkvm consoles more obvious in the backend (was: [consistent] unable to switch to text terminal in consoletest_setup -> bsc#1040606)Resolved2017-05-242018-01-17

Blocks openQA Tests - action #41207: [functional][u][ipmi] test fails in reboot_gnome - seems we call some code which we are not allowed to do, need to "reset_consoles" or something? nearly there to a complete run again :)Blocked2018-09-18

History

#2 Updated by okurz about 3 years ago

  • Blocked by action #19350: [sle][functional][s390x][zkvm][hard] make unavailable ssh based zkvm consoles more obvious in the backend (was: [consistent] unable to switch to text terminal in consoletest_setup -> bsc#1040606) added

#3 Updated by okurz about 3 years ago

  • Assignee set to nicksinger

as discussed during standup 2017-09-20

#4 Updated by okurz about 3 years ago

  • Target version set to Milestone 11

#6 Updated by okurz almost 3 years ago

  • Due date set to 2017-10-11

#7 Updated by okurz almost 3 years ago

The latest job does not incomplete but fail without a failed module stated. To my understanding the VNC stall detection is just a symptom, not the problem. Could be that the VNC process or a ssh terminal process died in the background and isotovideo does not check that so I suggest one of the following:

  1. (preferred) the VNC process terminating should not go unnoticed
  2. catch the "die" and handle it gracefully after the connection is terminated

#8 Updated by mgriessmeier almost 3 years ago

  • Status changed from New to In Progress

Work in Progress PR created, unfortunately not as far progressed as we wanted to have it due to more important issues popping up
okurz, nsinger: hopefully you can take this as a base to continue further in this sprint.

we turned the die into an Exception, but failed to add a record_info box - though we found a nice way to reproduce the "Socket does not exist" issue consistently (nsinger knows more about that)

https://github.com/os-autoinst/os-autoinst/pull/862

#9 Updated by mgriessmeier almost 3 years ago

  • Related to action #20022: [sle][functional][zkvm][s390] incomplete test due to socket does not exist. Probably your backend instance could not start or died added

#10 Updated by riafarov almost 3 years ago

PR with fix of review comment to be able to merge it: https://github.com/os-autoinst/os-autoinst/pull/864

#11 Updated by okurz almost 3 years ago

did not complete in sprint 1. main reason: spontaneous packaging training which we were not aware of in before. we have the PR which should improve user feedback a lot and this is definitely possible in the next sprint 2.

#12 Updated by okurz almost 3 years ago

  • Due date changed from 2017-10-11 to 2017-10-25

#14 Updated by nicksinger almost 3 years ago

okurz helped a hell lot to form a hypothesis together with me what happens here; our current IPMI code in the test (boot/boot_from_pxe.pm) checks for a running SSHD on the SUT and continues to execute. The next step activates a console named "install" what basically means for the worker: "open a connection (whatever protocol) to the SUT and expect the yast installer there". Right now the tests expects a running VNC implicit by checking for a running SSHD. This worked for a long time but obviously does not apply anymore so the test tries to connect to early and receives a "Connection timed out".

The current approach now is to adjust the needle which checks for a running SSH/VNC dynamically based on the variable "VIDEOMODE": http://openqa.glados.qa.suse.de/tests/484#step/boot_from_pxe/24

#15 Updated by okurz almost 3 years ago

screenshot does not look like there is a responsive VNC server. I suggest to crosscheck manually.

#16 Updated by nicksinger almost 3 years ago

  • Description updated (diff)

#17 Updated by nicksinger almost 3 years ago

Manual investigation of okurz and me revealed the kernel parameter/console redirection as part of the cause why this fails. Removing the additional "console=tty" and just let "console=ttyS1,115200" in there results in the expected output on the serial console.

#18 Updated by okurz almost 3 years ago

as discussed, please make sure the ticket is closed today

#19 Updated by okurz almost 3 years ago

  • Due date set to 2017-10-25

due to changes in a related task

#20 Updated by okurz almost 3 years ago

  • Subject changed from [sle][functional][ipmi]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?
  • Status changed from In Progress to Feedback

I think the comment you wanted to add is "https://github.com/nicksinger/os-autoinst-distri-opensuse/commit/c7d775f042e346d00cd084bc9bf7b4df30ba7768 provides a first fix for this. Test can now continue and finds the right needle: http://openqa.glados.qa.suse.de/tests/517 . The test can still not succeed since the VPNd binds to the second interface and is therefore not reachable on the expected address." from #26038#note-14

So we failed to close this ticket today … but at least I tried to improve by creating subtickets

#21 Updated by nicksinger almost 3 years ago

  • Status changed from Feedback to Resolved

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3770 addresses the major problem we want to address here (test checks for SSH running while we really want to connect to VNC). Anyway, it will still fail and there is more improvement needed which I'll track in another ticket.

#22 Updated by okurz almost 3 years ago

  • Status changed from Resolved to In Progress

#23 Updated by nicksinger almost 3 years ago

  • Status changed from In Progress to Feedback

#24 Updated by okurz almost 3 years ago

  • Due date changed from 2017-10-25 to 2017-11-08

due to changes in a related task

#25 Updated by okurz almost 3 years ago

  • Target version changed from Milestone 11 to Milestone 12

#26 Updated by okurz almost 3 years ago

  • Due date changed from 2017-11-08 to 2018-01-16

due to changes in a related task

#27 Updated by nicksinger almost 3 years ago

  • Assignee deleted (nicksinger)

#28 Updated by okurz over 2 years ago

  • Due date changed from 2018-01-16 to 2018-02-27

due to changes in a related task

#29 Updated by okurz over 2 years ago

  • Target version changed from Milestone 12 to Milestone 14

#30 Updated by okurz over 2 years ago

  • Assignee set to SLindoMansilla

#31 Updated by SLindoMansilla over 2 years ago

  • Status changed from Feedback to In Progress

Working on sub-task

#32 Updated by riafarov over 2 years ago

  • Due date changed from 2018-02-27 to 2018-03-13

due to changes in a related task

#33 Updated by SLindoMansilla over 2 years ago

  • Status changed from In Progress to Resolved

All sub task resolved, all related tasks resolved.

#34 Updated by okurz over 2 years ago

  • Status changed from Resolved to Workable

But expected result is not met: See https://openqa.suse.de/tests/1514178 failing in first_boot

#35 Updated by SLindoMansilla over 2 years ago

Trying with DESKTOP=textmode: http://copland.arch.suse.de/tests/942

#37 Updated by mgriessmeier over 2 years ago

  • Due date changed from 2018-03-13 to 2018-03-27

due to changes in a related task

#38 Updated by mgriessmeier over 2 years ago

  • Target version changed from Milestone 14 to Milestone 15

#39 Updated by mgriessmeier over 2 years ago

  • Due date changed from 2018-03-27 to 2018-04-10

due to changes in a related task

#40 Updated by okurz over 2 years ago

  • Subject changed from [sle][functional][ipmi][epic]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic][u]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?

#41 Updated by mgriessmeier over 2 years ago

  • Due date changed from 2018-04-10 to 2018-04-24

due to changes in a related task

#42 Updated by SLindoMansilla over 2 years ago

  • Subject changed from [sle][functional][ipmi][epic][u]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic][u] Fix test suite gnome to work on ipmi 12-SP3 and 15 (WAS: test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?)

#43 Updated by SLindoMansilla over 2 years ago

  • Related to action #31375: [sle][functional][ipmi][u][hard] test fails in first_boot - VNC installation on SLE 15 failed because of various issues (ipmi worker, first_boot, boot_from_pxe, await_install) added

#44 Updated by mgriessmeier over 2 years ago

  • Due date changed from 2018-04-24 to 2018-05-08

due to changes in a related task

#45 Updated by okurz over 2 years ago

  • Target version changed from Milestone 15 to Milestone 16

correcting milestone

#46 Updated by okurz over 2 years ago

  • Related to action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all added

#47 Updated by okurz over 2 years ago

  • Description updated (diff)
  • Target version changed from Milestone 16 to Milestone 17

https://openqa.suse.de/tests/1749830 is the latest job, not exactly "working" so we are not done here.

#48 Updated by okurz over 2 years ago

  • Target version changed from Milestone 17 to Milestone 21+

#49 Updated by okurz over 2 years ago

  • Target version changed from Milestone 21+ to Milestone 21+

#50 Updated by okurz almost 2 years ago

  • Status changed from Workable to Blocked

#51 Updated by SLindoMansilla almost 2 years ago

  • Related to deleted (action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all)

#52 Updated by SLindoMansilla almost 2 years ago

  • Blocked by action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all added

#53 Updated by SLindoMansilla almost 2 years ago

  • Blocked by deleted (action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all)

#54 Updated by SLindoMansilla almost 2 years ago

Sorry, I confused the subject line with the PXE. Added as subtask: #36027

#55 Updated by SLindoMansilla almost 2 years ago

  • Related to deleted (action #31375: [sle][functional][ipmi][u][hard] test fails in first_boot - VNC installation on SLE 15 failed because of various issues (ipmi worker, first_boot, boot_from_pxe, await_install))

#56 Updated by SLindoMansilla almost 2 years ago

Maybe related, this job restarts IPMI machines: http://jenkins.qa.suse.de/job/restart-ipmi-mainboard/

#57 Updated by okurz almost 2 years ago

  • Blocks action #41207: [functional][u][ipmi] test fails in reboot_gnome - seems we call some code which we are not allowed to do, need to "reset_consoles" or something? nearly there to a complete run again :) added

#58 Updated by okurz over 1 year ago

  • Target version changed from Milestone 21+ to Milestone 24

#59 Updated by mgriessmeier over 1 year ago

  • Target version changed from Milestone 24 to Milestone 25

Idk what's the state here - can someone explain?

#60 Updated by SLindoMansilla over 1 year ago

  • Description updated (diff)
  • Status changed from Blocked to Workable

Blocker resolved: #19350

#61 Updated by SLindoMansilla over 1 year ago

  • Description updated (diff)
  • Assignee deleted (SLindoMansilla)

#62 Updated by SLindoMansilla over 1 year ago

  • Description updated (diff)

#63 Updated by zluo over 1 year ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over. this is 2 years old ticket!

#64 Updated by zluo over 1 year ago

https://openqa.suse.de/tests/2931505 shows boot_from_pxe works fine.

re-trigger it on osd because hostname_inst got wrongly scheduled for ipmi:
https://openqa.suse.de/tests/2956663#settings

#65 Updated by zluo over 1 year ago

  • Status changed from In Progress to Rejected

grub_test failed, but this is another issue. So I don't see any problem for gnome test on ipmi.

set as rejected for now

#66 Updated by okurz over 1 year ago

  • Status changed from Rejected to In Progress

It is true that "boot_from_pxe" is now more stable. However the ACs are not fulfilled, please see the description for that. It mentions four scenarios. Currently https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&groupid=129&groupid=110&groupid=132&build=228.2&arch=x86_64 shows only btrfs@ipmi so default@ipmi, gnome@ipmi and the corresponding two for SLE15 are missing.

#67 Updated by zluo over 1 year ago

  • Target version changed from Milestone 25 to Milestone 26

Sergio has added gnome@64bit-ipmi now, need to check the test results for next build.

#68 Updated by SLindoMansilla over 1 year ago

We still didn't have any build since I schedule the gnome@ipmi job.
If necessary we could perform an JOB POST typing the settings manually.

#70 Updated by SLindoMansilla about 1 year ago

  • Description updated (diff)

#71 Updated by zluo about 1 year ago

@sergio this is working for boot_from_pxe, can you put the job then into Job groups "SLES 12 functional"? Thanks!

#72 Updated by okurz about 1 year ago

the test still fails in first_boot. Nothing changed because no one changed code: https://openqa.suse.de/tests/3038117#step/first_boot/7 so as long as this doesn't work you should not bring it into the validation job group.

#73 Updated by zluo about 1 year ago

I thought boot_from_pxe was not working. it works now, firs_boot failed, I think this is another issue.

#74 Updated by zluo about 1 year ago

to check https://openqa.suse.de/tests/3043347 (without reconnect_mgmt_console, grub_test)

to check https://openqa.suse.de/tests/3043448 (without grub_test) as well

#76 Updated by zluo about 1 year ago

  • Status changed from In Progress to Blocked

https://openqa.suse.de/tests/3043347 shows that first_boot works fine if grub_test is not started before.

So we need to handle the issue reported: #53249

#77 Updated by zluo about 1 year ago

  • Blocked by action #53249: [epic][functional][u] ensure that grub_test gets a booting system added

#78 Updated by zluo about 1 year ago

  • Target version changed from Milestone 26 to Milestone 30+

#79 Updated by zluo about 1 year ago

  • Status changed from Blocked to Workable

#80 Updated by zluo about 1 year ago

  • Status changed from Workable to Blocked

#81 Updated by SLindoMansilla about 1 year ago

  • Due date changed from 2018-12-31 to 2018-09-27

due to changes in a related task

#82 Updated by SLindoMansilla about 1 year ago

  • Due date changed from 2018-05-08 to 2018-09-27

due to changes in a related task

#83 Updated by SLindoMansilla about 1 year ago

  • Due date changed from 2018-03-13 to 2018-09-27

due to changes in a related task

#84 Updated by SLindoMansilla about 1 year ago

  • Due date changed from 2017-11-08 to 2018-09-27

due to changes in a related task

#85 Updated by SLindoMansilla about 1 year ago

  • Due date changed from 2017-11-08 to 2018-09-27

due to changes in a related task

#86 Updated by okurz about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3247183

#87 Updated by okurz about 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3329449

#88 Updated by okurz almost 1 year ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3381530

#89 Updated by okurz 10 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3598216

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#90 Updated by okurz 10 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3649158

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#91 Updated by okurz 9 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3700456

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#92 Updated by okurz 9 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3727073

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#93 Updated by mgriessmeier 9 months ago

  • Target version changed from Milestone 30+ to Milestone 30

needs to be discussed offline

#94 Updated by okurz 8 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3770256

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#95 Updated by szarate 4 months ago

  • Assignee changed from zluo to SLindoMansilla

I guess this ticket can be rejected

#96 Updated by SLindoMansilla 3 months ago

  • Status changed from Blocked to In Progress

True, #60500 should be enough.

#97 Updated by SLindoMansilla 3 months ago

  • Related to action #60500: [sle][functional][u] - default@64bit-ipmi scheduled for SLES 15 SP2 added

#98 Updated by SLindoMansilla 3 months ago

  • Blocked by deleted (action #53249: [epic][functional][u] ensure that grub_test gets a booting system)

#99 Updated by SLindoMansilla 3 months ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF