coordination #23650
closed[sle][functional][ipmi][epic][u] Fix test suite gnome to work on ipmi 12-SP3 and 15 (WAS: test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?)
100%
Description
Observation¶
openQA test in scenario sle-15-Leanos-DVD-x86_64-gnome@64bit-ipmi fails in
boot_from_pxe as incomplete with the message "Error connecting to host : IO::Socket::INET: connect: Connection refused" in the autoinst-log.txt.
Expected result¶
The test never worked for SLE15.
Acceptance criteria¶
- AC1: Test suite default is able to complete as install only for SLE12-SP5+ and SLE15-SP1+ over IPMI
- AC2: Test suite gnome is able to complete as install only for SLE12-SP5+ and SLE15-SP1+ over IPMI
Problem¶
autoinst-log.txt
10:08:07.4045 Debug: /var/lib/openqa/cache/tests/sle/tests/boot/boot_from_pxe.pm:100 called testapi::select_console
10:08:07.4046 5804 <<< testapi::select_console(testapi_console='installation')
/usr/lib/os-autoinst/consoles/vnc_base.pm:64:{
'password' => 'nots3cr3t',
'port' => 5901,
'hostname' => '10.162.2.87'
}
10:08:09.4100 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:10.4115 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:11.4129 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:12.4139 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:13.4150 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:14.4162 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:15.4174 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
10:08:16.4186 5808 Error connecting to host <10.162.2.87>: IO::Socket::INET: connect: Connection refused
DIE socket does not exist. Probably your backend instance could not start or died. at /usr/lib/os-autoinst/consoles/VNC.pm line 881.
at /usr/lib/os-autoinst/backend/baseclass.pm line 80.
backend::baseclass::die_handler('socket does not exist. Probably your backend instance could n...') called at /usr/lib/os-autoinst/consoles/VNC.pm line 801
consoles::VNC::catch {...} ('socket does not exist. Probably your backend instance could n...') called at /usr/lib/perl5/vendor_perl/5.18.2/Try/Tiny.pm line 115
Try::Tiny::try('CODE(0x843f028)', 'Try::Tiny::Catch=REF(0x843f310)') called at /usr/lib/os-autoinst/consoles/VNC.pm line 803
consoles::VNC::update_framebuffer('consoles::VNC=HASH(0x8440268)') called at /usr/lib/os-autoinst/consoles/vnc_base.pm line 74
That's annoying because incompletes are harder to understand and carry over can't work. (Improvement of this message is handled separately)
Further details¶
Always latest result in this scenario:
- latest SLE15-SP2 default(server role) (not yet available)
- latest SLE15-SP5 default
- latest SLE15-SP5 gnome
- latest SLE15-SP1 default (server role)
- latest SLE15-SP1 gnome
- latest SLE15
- former latest, Leanos-DVD
Updated by okurz about 7 years ago
Latest example: https://openqa.suse.de/tests/1176623/file/autoinst-log.txt
Updated by okurz about 7 years ago
- Blocked by action #19350: [sle][functional][s390x][zkvm][hard] make unavailable ssh based zkvm consoles more obvious in the backend (was: [consistent] unable to switch to text terminal in consoletest_setup -> bsc#1040606) added
Updated by okurz about 7 years ago
- Assignee set to nicksinger
as discussed during standup 2017-09-20
Updated by mgriessmeier about 7 years ago
recent example:
https://openqa.suse.de/tests/1180236
Updated by okurz about 7 years ago
The latest job does not incomplete but fail without a failed module stated. To my understanding the VNC stall detection is just a symptom, not the problem. Could be that the VNC process or a ssh terminal process died in the background and isotovideo does not check that so I suggest one of the following:
- (preferred) the VNC process terminating should not go unnoticed
- catch the "die" and handle it gracefully after the connection is terminated
Updated by mgriessmeier about 7 years ago
- Status changed from New to In Progress
Work in Progress PR created, unfortunately not as far progressed as we wanted to have it due to more important issues popping up
@okurz, nsinger: hopefully you can take this as a base to continue further in this sprint.
we turned the die into an Exception, but failed to add a record_info box - though we found a nice way to reproduce the "Socket does not exist" issue consistently (nsinger knows more about that)
Updated by mgriessmeier about 7 years ago
- Related to action #20022: [sle][functional][zkvm][s390] incomplete test due to socket does not exist. Probably your backend instance could not start or died added
Updated by riafarov about 7 years ago
PR with fix of review comment to be able to merge it: https://github.com/os-autoinst/os-autoinst/pull/864
Updated by okurz about 7 years ago
did not complete in sprint 1. main reason: spontaneous packaging training which we were not aware of in before. we have the PR which should improve user feedback a lot and this is definitely possible in the next sprint 2.
Updated by okurz about 7 years ago
- Due date changed from 2017-10-11 to 2017-10-25
Updated by okurz about 7 years ago
latest example: https://openqa.suse.de/tests/1206641#
Updated by nicksinger about 7 years ago
@okurz helped a hell lot to form a hypothesis together with me what happens here; our current IPMI code in the test (boot/boot_from_pxe.pm) checks for a running SSHD on the SUT and continues to execute. The next step activates a console named "install" what basically means for the worker: "open a connection (whatever protocol) to the SUT and expect the yast installer there". Right now the tests expects a running VNC implicit by checking for a running SSHD. This worked for a long time but obviously does not apply anymore so the test tries to connect to early and receives a "Connection timed out".
The current approach now is to adjust the needle which checks for a running SSH/VNC dynamically based on the variable "VIDEOMODE": http://openqa.glados.qa.suse.de/tests/484#step/boot_from_pxe/24
Updated by okurz about 7 years ago
screenshot does not look like there is a responsive VNC server. I suggest to crosscheck manually.
Updated by nicksinger about 7 years ago
Manual investigation of @okurz and me revealed the kernel parameter/console redirection as part of the cause why this fails. Removing the additional "console=tty" and just let "console=ttyS1,115200" in there results in the expected output on the serial console.
Updated by okurz about 7 years ago
as discussed, please make sure the ticket is closed today
Updated by okurz about 7 years ago
- Subject changed from [sle][functional][ipmi]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?
- Status changed from In Progress to Feedback
I think the comment you wanted to add is "https://github.com/nicksinger/os-autoinst-distri-opensuse/commit/c7d775f042e346d00cd084bc9bf7b4df30ba7768 provides a first fix for this. Test can now continue and finds the right needle: http://openqa.glados.qa.suse.de/tests/517 . The test can still not succeed since the VPNd binds to the second interface and is therefore not reachable on the expected address." from #26038#note-14
So we failed to close this ticket today … but at least I tried to improve by creating subtickets
Updated by nicksinger about 7 years ago
- Status changed from Feedback to Resolved
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3770 addresses the major problem we want to address here (test checks for SSH running while we really want to connect to VNC). Anyway, it will still fail and there is more improvement needed which I'll track in another ticket.
Updated by okurz about 7 years ago
- Status changed from Resolved to In Progress
Updated by nicksinger about 7 years ago
- Status changed from In Progress to Feedback
Updated by okurz about 7 years ago
- Due date changed from 2017-10-25 to 2017-11-08
due to changes in a related task
Updated by okurz almost 7 years ago
- Target version changed from Milestone 11 to Milestone 12
Updated by okurz almost 7 years ago
- Due date changed from 2017-11-08 to 2018-01-16
due to changes in a related task
Updated by okurz almost 7 years ago
- Due date changed from 2018-01-16 to 2018-02-27
due to changes in a related task
Updated by okurz almost 7 years ago
- Target version changed from Milestone 12 to Milestone 14
Updated by SLindoMansilla over 6 years ago
- Status changed from Feedback to In Progress
Working on sub-task
Updated by riafarov over 6 years ago
- Due date changed from 2018-02-27 to 2018-03-13
due to changes in a related task
Updated by SLindoMansilla over 6 years ago
- Status changed from In Progress to Resolved
All sub task resolved, all related tasks resolved.
Updated by okurz over 6 years ago
- Status changed from Resolved to Workable
But expected result is not met: See https://openqa.suse.de/tests/1514178 failing in first_boot
Updated by SLindoMansilla over 6 years ago
Trying with DESKTOP=textmode: http://copland.arch.suse.de/tests/942
Updated by SLindoMansilla over 6 years ago
Continuing in sub task: https://progress.opensuse.org/issues/32089
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-03-13 to 2018-03-27
due to changes in a related task
Updated by mgriessmeier over 6 years ago
- Target version changed from Milestone 14 to Milestone 15
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-03-27 to 2018-04-10
due to changes in a related task
Updated by okurz over 6 years ago
- Subject changed from [sle][functional][ipmi][epic]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic][u]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-04-10 to 2018-04-24
due to changes in a related task
Updated by SLindoMansilla over 6 years ago
- Subject changed from [sle][functional][ipmi][epic][u]test fails in boot_from_pxe - connection refused trying to ipmi host over ssh? to [sle][functional][ipmi][epic][u] Fix test suite gnome to work on ipmi 12-SP3 and 15 (WAS: test fails in boot_from_pxe - connection refused trying to ipmi host over ssh?)
Updated by SLindoMansilla over 6 years ago
- Related to action #31375: [sle][functional][ipmi][u][hard] test fails in first_boot - VNC installation on SLE 15 failed because of various issues (ipmi worker, first_boot, boot_from_pxe, await_install) added
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-04-24 to 2018-05-08
due to changes in a related task
Updated by okurz over 6 years ago
- Target version changed from Milestone 15 to Milestone 16
correcting milestone
Updated by okurz over 6 years ago
- Related to action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all added
Updated by okurz over 6 years ago
- Description updated (diff)
- Target version changed from Milestone 16 to Milestone 17
https://openqa.suse.de/tests/1749830 is the latest job, not exactly "working" so we are not done here.
Updated by okurz over 6 years ago
- Target version changed from Milestone 17 to Milestone 21+
Updated by okurz over 6 years ago
- Target version changed from Milestone 21+ to Milestone 21+
Updated by okurz about 6 years ago
- Status changed from Workable to Blocked
Updated by SLindoMansilla about 6 years ago
- Related to deleted (action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all)
Updated by SLindoMansilla about 6 years ago
- Blocked by action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all added
Updated by SLindoMansilla about 6 years ago
- Blocked by deleted (action #36027: [sle][functional][u][ipmi] test fails in boot_from_pxe - pxe boot menu doesn't show up at all)
Updated by SLindoMansilla about 6 years ago
Sorry, I confused the subject line with the PXE. Added as subtask: #36027
Updated by SLindoMansilla about 6 years ago
- Related to deleted (action #31375: [sle][functional][ipmi][u][hard] test fails in first_boot - VNC installation on SLE 15 failed because of various issues (ipmi worker, first_boot, boot_from_pxe, await_install))
Updated by SLindoMansilla about 6 years ago
Maybe related, this job restarts IPMI machines: http://jenkins.qa.suse.de/job/restart-ipmi-mainboard/
Updated by okurz almost 6 years ago
- Blocks action #41207: [qe-core][functional][ipmi] test fails in reboot_gnome - seems we call some code which we are not allowed to do, need to "reset_consoles" or something? nearly there to a complete run again :) added
Updated by okurz almost 6 years ago
- Target version changed from Milestone 21+ to Milestone 24
Updated by mgriessmeier over 5 years ago
- Target version changed from Milestone 24 to Milestone 25
Idk what's the state here - can someone explain?
Updated by SLindoMansilla over 5 years ago
- Description updated (diff)
- Status changed from Blocked to Workable
Blocker resolved: #19350
Updated by SLindoMansilla over 5 years ago
- Description updated (diff)
- Assignee deleted (
SLindoMansilla)
Updated by zluo over 5 years ago
- Status changed from Workable to In Progress
- Assignee set to zluo
take over. this is 2 years old ticket!
Updated by zluo over 5 years ago
https://openqa.suse.de/tests/2931505 shows boot_from_pxe works fine.
re-trigger it on osd because hostname_inst got wrongly scheduled for ipmi:
https://openqa.suse.de/tests/2956663#settings
Updated by zluo over 5 years ago
- Status changed from In Progress to Rejected
grub_test failed, but this is another issue. So I don't see any problem for gnome test on ipmi.
set as rejected for now
Updated by okurz over 5 years ago
- Status changed from Rejected to In Progress
It is true that "boot_from_pxe" is now more stable. However the ACs are not fulfilled, please see the description for that. It mentions four scenarios. Currently https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&groupid=129&groupid=110&groupid=132&build=228.2&arch=x86_64 shows only btrfs@ipmi so default@ipmi, gnome@ipmi and the corresponding two for SLE15 are missing.
Updated by zluo over 5 years ago
- Target version changed from Milestone 25 to Milestone 26
Sergio has added gnome@64bit-ipmi now, need to check the test results for next build.
Updated by SLindoMansilla over 5 years ago
We still didn't have any build since I schedule the gnome@ipmi job.
If necessary we could perform an JOB POST typing the settings manually.
Updated by SLindoMansilla over 5 years ago
Development group SLE12-SP4: https://openqa.suse.de/group_overview/132
sle-12-SP5-Server-DVD-x86_64-Build0214-gnome@64bit-ipmi
Updated by zluo over 5 years ago
@sergio this is working for boot_from_pxe, can you put the job then into Job groups "SLES 12 functional"? Thanks!
Updated by okurz over 5 years ago
the test still fails in first_boot. Nothing changed because no one changed code: https://openqa.suse.de/tests/3038117#step/first_boot/7 so as long as this doesn't work you should not bring it into the validation job group.
Updated by zluo over 5 years ago
I thought boot_from_pxe was not working. it works now, firs_boot failed, I think this is another issue.
Updated by zluo over 5 years ago
to check https://openqa.suse.de/tests/3043347 (without reconnect_mgmt_console, grub_test)
to check https://openqa.suse.de/tests/3043448 (without grub_test) as well
Updated by zluo over 5 years ago
to check https://openqa.suse.de/tests/3043539#settings with my WIP PR
or clone https://openqa.suse.de/tests/3043542#settings
Updated by zluo over 5 years ago
- Status changed from In Progress to Blocked
https://openqa.suse.de/tests/3043347 shows that first_boot works fine if grub_test is not started before.
So we need to handle the issue reported: #53249
Updated by zluo over 5 years ago
- Blocked by coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system added
Updated by zluo over 5 years ago
- Target version changed from Milestone 26 to Milestone 30+
Updated by SLindoMansilla over 5 years ago
- Due date changed from 2018-12-31 to 2018-09-27
due to changes in a related task
Updated by SLindoMansilla over 5 years ago
- Due date changed from 2018-05-08 to 2018-09-27
due to changes in a related task
Updated by SLindoMansilla over 5 years ago
- Due date changed from 2018-03-13 to 2018-09-27
due to changes in a related task
Updated by SLindoMansilla over 5 years ago
- Due date changed from 2017-11-08 to 2018-09-27
due to changes in a related task
Updated by SLindoMansilla over 5 years ago
- Due date changed from 2017-11-08 to 2018-09-27
due to changes in a related task
Updated by okurz about 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3247183
Updated by okurz about 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3329449
Updated by okurz about 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3381530
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3598216
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3649158
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3700456
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3727073
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by mgriessmeier almost 5 years ago
- Target version changed from Milestone 30+ to Milestone 30
needs to be discussed offline
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: btrfs_libstorage-ng@64bit-ipmi
https://openqa.suse.de/tests/3770256
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by szarate over 4 years ago
- Assignee changed from zluo to SLindoMansilla
I guess this ticket can be rejected
Updated by SLindoMansilla over 4 years ago
- Status changed from Blocked to In Progress
True, #60500 should be enough.
Updated by SLindoMansilla over 4 years ago
- Related to action #60500: [sle][functional][u] - default@64bit-ipmi scheduled for SLES 15 SP2 added
Updated by SLindoMansilla over 4 years ago
- Blocked by deleted (coordination #53249: [epic][qe-core][functional] ensure that grub_test gets a booting system)
Updated by SLindoMansilla over 4 years ago
- Status changed from In Progress to Resolved
Updated by szarate about 4 years ago
- Tracker changed from action to coordination
Updated by szarate about 4 years ago
See for the reason of tracker change: http://mailman.suse.de/mailman/private/qa-sle/2020-October/002722.html