Project

General

Profile

action #33202

[sle][functional][s390x][zkvm][u][hard] test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it)

Added by okurz over 3 years ago. Updated about 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 18
Start date:
2018-03-13
Due date:
2018-08-14
% Done:

0%

Estimated time:
13.00 h
Difficulty:
hard

Description

Observation

openQA test in scenario sle-12-SP4-Server-DVD-s390x-sched_stress@zkvm fails in
boot_to_desktop

Acceptance criteria

  • AC1 No black screen without pop-up with a hint what went wrong, what is running and what we actually see

Suggestions

Put the wallpaper/dialog with the hint mentioned above

Reproducible

Fails since (at least) Build 0234 (current job)

Expected result

Last good: 0164 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #33199: [sle][functional][s390x][zkvm][u][hard] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed?Resolved2018-03-132018-04-10

Related to openQA Tests - action #34609: [sle][functional][u][medium] Improve Implementation of workaround for bsc#1083646 and debug output in reconnect_s390 on S390-KVMRejected2018-04-102018-04-24

Related to openQA Tests - action #36745: [openqa][sle][functional][u][s390x][zkvm][kernel] Broken boot due "Test died: no candidate needle with tag(s) 'password-prompt' matched"Resolved2018-06-04

Related to openQA Tests - action #48260: [sle][functional][u][s390x][kvm] test fails in reboot_after_installation - "The console isn't responding correctly. Maybe half-open socket?"Resolved2019-02-21

Blocks openQA Tests - action #32683: [sle][functional][u][medium] Implement proper post_fail_hook for boot_to_desktopResolved2018-03-02

Blocked by openQA Project - action #34003: [tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. sshResolved2018-03-29

Blocks openQA Tests - action #33865: [sles][functional][s390x][easy][y] Enable yast2_ncurses testsuite for s390xResolved2018-03-27

Blocks openQA Tests - action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigationResolved2018-06-04

Copied to openQA Tests - action #39809: [functional][u][s390x] ssh connection check shows red border misleading that something is wrong when there is not -> should be no red borderResolved2018-03-13

History

#1 Updated by okurz over 3 years ago

  • Due date changed from 2018-04-24 to 2018-04-10

Seems we have a bit more capacity in the current sprint S13 as well as the upcoming one. Let's see how you are going to handle that! ;)

#2 Updated by riafarov over 3 years ago

  • Subject changed from [sle][functional][s390x][zkvm][u]test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - I hate it to [sle][functional][s390x][zkvm][u]test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it)
  • Description updated (diff)
  • Status changed from New to Workable

#3 Updated by riafarov over 3 years ago

  • Subject changed from [sle][functional][s390x][zkvm][u]test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it) to [sle][functional][s390x][zkvm][u][hard] test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it)

#4 Updated by cwh over 3 years ago

  • Difficulty set to hard

#5 Updated by okurz over 3 years ago

  • Related to action #34003: [tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. ssh added

#6 Updated by okurz over 3 years ago

  • Related to action #33199: [sle][functional][s390x][zkvm][u][hard] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed? added

#7 Updated by okurz over 3 years ago

  • Related to action #34609: [sle][functional][u][medium] Improve Implementation of workaround for bsc#1083646 and debug output in reconnect_s390 on S390-KVM added

#8 Updated by mgriessmeier over 3 years ago

  • Due date changed from 2018-04-10 to 2018-04-24

#9 Updated by okurz over 3 years ago

  • Due date changed from 2018-04-24 to 2018-05-08
  • Target version changed from Milestone 15 to Milestone 16

We hate it and we will continue to hate it, gosh it's hard

#10 Updated by okurz over 3 years ago

  • Blocks action #32683: [sle][functional][u][medium] Implement proper post_fail_hook for boot_to_desktop added

#11 Updated by okurz over 3 years ago

  • Related to deleted (action #34003: [tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. ssh)

#12 Updated by okurz over 3 years ago

  • Blocked by action #34003: [tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. ssh added

#13 Updated by okurz over 3 years ago

  • Due date changed from 2018-05-08 to 2018-06-05
  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Target version changed from Milestone 16 to Milestone 17

blocked by #34003 which we would like to make the tools team aware about

#14 Updated by mgriessmeier over 3 years ago

  • Status changed from Blocked to In Progress
  • Assignee changed from okurz to mgriessmeier

not blocked anymore, since we now have the debug output and have a hint what's going on

submitted https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5061 to check if the SUT is able to ping the worker before selecting the first console

#15 Updated by okurz over 3 years ago

… what if ping is not installed? -> https://openqa.suse.de/tests/1707606#step/boot_to_desktop/20 , read: all userspace tests now fail to pass "boot_to_desktop"

#16 Updated by pvorel over 3 years ago

okurz wrote:

… what if ping is not installed? -> https://openqa.suse.de/tests/1707606#step/boot_to_desktop/20 , read: all userspace tests now fail to pass "boot_to_desktop"

I reported this particular problem to #36745

#17 Updated by pvorel over 3 years ago

  • Related to action #36745: [openqa][sle][functional][u][s390x][zkvm][kernel] Broken boot due "Test died: no candidate needle with tag(s) 'password-prompt' matched" added

#18 Updated by pvorel over 3 years ago

pvorel wrote:

okurz wrote:

… what if ping is not installed? -> https://openqa.suse.de/tests/1707606#step/boot_to_desktop/20 , read: all userspace tests now fail to pass "boot_to_desktop"

I reported this particular problem to #36745

Actually this is caused by something else than ping problem => poo#36745 might or might not be related to this.

#19 Updated by mgriessmeier over 3 years ago

  • Due date changed from 2018-06-05 to 2018-06-19

working on this in upcoming sprint

#20 Updated by mgriessmeier over 3 years ago

  • Status changed from In Progress to Blocked

resolve #26044 first... (even if it's the same solution)

#21 Updated by mgriessmeier over 3 years ago

  • Status changed from Blocked to Feedback

https://progress.opensuse.org/issues/26044 was resolved
It included also a potential fix for the boot_to_desktop which hopefully will not appear again
so setting to feedback now for tracking some future jobs before resolving

#22 Updated by okurz over 3 years ago

that's good. But keep in mind that this ticket is about "error reporting" not about fixing the underlying issue, e.g. simulate the old problem again and make it super-obvious to test reviewers what the problem is, not fix it :)

#23 Updated by mgriessmeier over 3 years ago

okurz wrote:

that's good. But keep in mind that this ticket is about "error reporting" not about fixing the underlying issue, e.g. simulate the old problem again and make it super-obvious to test reviewers what the problem is, not fix it :)

http://opeth.suse.de/tests/2412#step/boot_to_desktop/15 that's enough error reporting imho... don't know if it makes sense to add a "Worker cannot connect to SUT" message...

#24 Updated by okurz over 3 years ago

See for example https://openqa.suse.de/tests/1752808#step/boot_to_desktop/24 which shows a message "ssh: connect to host 10.161.145.16 port 22: No route to host". This is already helpful but the next thumbnail says "No candidate with tag 'password-prompt' matched" and I think we can still enhance the debugging here a tiny bit, e.g. a post_fail_hook that can provide more hints on what might have gone wrong.

#25 Updated by okurz over 3 years ago

  • Blocks action #33865: [sles][functional][s390x][easy][y] Enable yast2_ncurses testsuite for s390x added

#26 Updated by okurz over 3 years ago

I blocked #33865 by this now. See https://openqa.suse.de/tests/1759484#step/boot_to_desktop/17 as an example. We see an error message about "No route to host" but not much more.

#27 Updated by SLindoMansilla over 3 years ago

  • Blocks action #36754: [qe-core][functional][systemd][medium] test fails in systemd_testsuite - needs further investigation added

#28 Updated by mgriessmeier over 3 years ago

  • Status changed from Feedback to In Progress

apparently still some corner cases around

#29 Updated by okurz over 3 years ago

  • Target version changed from Milestone 17 to Milestone 17

#30 Updated by mgriessmeier over 3 years ago

so, there are still some occurences of this issue around - mainly in userspace regression tests, where the corresponding qcows seems to take longer until one is able to connect to them... e.g https://openqa.suse.de/tests/1764688

next suggestions are:

  • implement the retry loop in perl to get better feedback
  • increase the amount of retries to 10

will work on this still in the next sprint to get it hopefully finally solved

#31 Updated by mgriessmeier over 3 years ago

  • Due date changed from 2018-06-19 to 2018-07-03

#32 Updated by nicksinger about 3 years ago

Is this fail related in any way? System seems to boot fine and X started. However there is only a black screen visible without any hints to any errors.

#33 Updated by okurz about 3 years ago

Not exactly the same but related. It does not show a "black screen with mouse cursor" but on top a box with a text message about "Failed" and "Error connecting to host : IO::Socket::INET: connect: Connection timed out at /usr/lib/os-autoinst/testapi.pm line 1385." That's already a bit better than the original issue but still far from easily understandable what the test is trying to achieve, what was expected and what is seen instead and what could be potential error sources

#34 Updated by mgriessmeier about 3 years ago

provided another PR which is a more robust way of checking if the ssh-server in the SUT is (already) available.
it also provides better feedback to the reviewer what is actually going wrong.

#35 Updated by mgriessmeier about 3 years ago

  • Due date changed from 2018-07-03 to 2018-07-17

PR still in discussion - will track in next sprint and solve it there

#36 Updated by riafarov about 3 years ago

  • Estimated time set to 13.00 h

#37 Updated by mgriessmeier about 3 years ago

  • Due date changed from 2018-07-17 to 2018-07-31

move due to hackweek

#38 Updated by okurz about 3 years ago

  • Target version changed from Milestone 17 to Milestone 18

#39 Updated by mgriessmeier about 3 years ago

  • Due date changed from 2018-07-31 to 2018-08-14

I've rewritten my PR with Santis proposal of using the already fulfilled dependency of IO::Socket::INET, unfortunately I had no success to verify until now due to some issues with os-autoinst where ettore and Santi are working on right now.
So hopefully I will solve this today

#40 Updated by mgriessmeier about 3 years ago

  • Status changed from In Progress to Feedback

PR is merged - let's see how much it breaks =)

#41 Updated by mgriessmeier about 3 years ago

  • Status changed from Feedback to Resolved

nothing broke, PR is in place and working as expected

#42 Updated by okurz about 3 years ago

  • Copied to action #39809: [functional][u][s390x] ssh connection check shows red border misleading that something is wrong when there is not -> should be no red border added

#43 Updated by okurz over 2 years ago

  • Related to action #48260: [sle][functional][u][s390x][kvm] test fails in reboot_after_installation - "The console isn't responding correctly. Maybe half-open socket?" added

Also available in: Atom PDF