action #34003
closed[tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. ssh
0%
Description
Motivation¶
See for example in gh#os-autoinst/os-autoinst-distri-opensuse#4629 we struggle a lot to find out why a select_console 'root-console'
call fails for the s390x-kvm-sle12 machine using the svirt backend relying on an ssh-connection within an xterm-console within a local Xvnc server. If it fails we are mainly left with a mouse cursor on a black screen and nobody knows what is going on
Acceptance criteria¶
- AC1: logs tell when a remote connection, e.g. over SSH or VNC is made
- AC2: At least a hint is shown in logs or as thumbnail in openQA details view what could have gone wrong
Tasks¶
- Research how the svirt backend conducts ssh connections using the according console
- Add some logging with the according parameters, e.g. from where to where a connection is made
- Add error handling and reporting in case something goes wrong
- Take a look into other backends/consoles if maybe the same approach could be applied there
Updated by okurz over 6 years ago
- Related to action #33202: [sle][functional][s390x][zkvm][u][hard] test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it) added
Updated by okurz over 6 years ago
- Related to action #33199: [sle][functional][s390x][zkvm][u][hard] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed? added
Updated by mgriessmeier over 6 years ago
- Subject changed from [functional][u]Better logging and error handling in case of remote console connections in backends, e.g. ssh to [functional][u][medium] Better logging and error handling in case of remote console connections in backends, e.g. ssh
- Status changed from New to Workable
Updated by okurz over 6 years ago
- Related to action #34837: [sle12][functional][u][s390x-kvm] test fails in consoletest_finish while switching from console to GUI desktop, stuck on black screen added
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-04-24 to 2018-05-08
- Target version changed from Milestone 15 to Milestone 16
Updated by okurz over 6 years ago
- Related to deleted (action #33202: [sle][functional][s390x][zkvm][u][hard] test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it))
Updated by okurz over 6 years ago
- Blocks action #33202: [sle][functional][s390x][zkvm][u][hard] test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it) added
Updated by okurz over 6 years ago
- Subject changed from [functional][u][medium] Better logging and error handling in case of remote console connections in backends, e.g. ssh to [tools] Better logging and error handling in case of remote console connections in backends, e.g. ssh
- Due date deleted (
2018-05-08) - Target version deleted (
Milestone 16)
Dear tools-team, this feature would help QSF and hopefully other teams very much. Would you like to take a look into this please?
Updated by okurz over 6 years ago
- Related to action #35641: [sle][functional][u][sporadic][mistyping] test fails in svirt_upload_assets - unchanged PS1 added
Updated by okurz over 6 years ago
- Blocks action #33865: [sles][functional][s390x][easy][y] Enable yast2_ncurses testsuite for s390x added
Updated by coolo over 6 years ago
I don't really understand what you expect here. Agressive debug output from ssh?
Updated by coolo over 6 years ago
So, I did look into the issue and there doesn't seem to be a problem with ssh actually. The failure looks like this:
https://openqa.suse.de/tests/1656222#step/boot_to_desktop/21
While if the xterm started, it looks like this:
https://openqa.suse.de/tests/1563415#step/boot_to_desktop/19
What you notice in the failed log:
[2018-05-01T08:50:09.0524 CEST] [debug] <<< testapi::select_console(testapi_console='root-console')
/usr/lib/os-autoinst/consoles/vnc_base.pm:71:{
'hostname' => 'localhost',
'port' => 35180,
'ikvm' => 0
}
IceWM: using /var/lib/empty/.icewm for private configuration files
11841
[2018-05-01T08:50:10.0777 CEST] [debug] activate_console, console: root-console, type: console
[2018-05-01T08:50:10.0777 CEST] [debug] backend s390x || zkvm
[2018-05-01T08:50:10.0778 CEST] [debug] /var/lib/openqa/cache/tests/sle/tests/boot/boot_to_desktop.pm:27 called opensusebasetest::wait_boot
[2018-05-01T08:50:10.0778 CEST] [debug] <<< testapi::assert_screen(mustmatch='password-prompt', timeout=30)
[2018-05-01T08:50:10.0833 CEST] [debug] MATCH(password-prompt-20141126:0.00)
[2018-05-01T08:50:10.0838 CEST] [info] Collected unknown process with pid 11841 and exit status: 0
Note that 11841 is the xterm/Xvnc pid - I don't know why it crashes/stops, but it has nothing to do with ssh
Updated by okurz over 6 years ago
coolo wrote:
[…]
Note that 11841 is the xterm/Xvnc pid - I don't know why it crashes/stops, but it has nothing to do with ssh
Well, then I consider the "Xvnc has vanished" as part of the story.
Do the Acceptance criteria make sense to you?
Take a look into a standard test with "remote backend", e.g. default@s390x-kvm. Looking for select_console
I see there blocks like
[2018-05-03T23:28:36.0547 CEST] [debug] <<< testapi::select_console(testapi_console='svirt', await_console=0)
[2018-05-03T23:28:36.0628 CEST] [debug] Connection to root@s390p8.suse.de established
/usr/lib/os-autoinst/consoles/vnc_base.pm:71:{
'port' => 56100,
which is already helpful because we know that an ssh-connection is established. However we the part about VNC is not really clear to test reviewers. So I think one debug line could be helpful. And in case the ssh connection is not working it might be hard to understand who tried to connect to where (from where) and why so I suggest a debug line there as well that say something like "[debug] select_console 'svirt': Connecting from '' to ''". The related tickets show some examples, please see them as well. For example https://openqa.suse.de/tests/1536522#step/boot_to_desktop/20 is a case of a black screen and it is not obvious from the logs what is going wrong.
Updated by coolo over 6 years ago
the ssh connections don't happen in backends - they happen in consoles
Updated by okurz over 6 years ago
- Subject changed from [tools] Better logging and error handling in case of remote console connections in backends, e.g. ssh to [tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. ssh
Mainly yes. But at least the backend/baseclass.pm has the message bmwqemu::diag "Connection to $args{username}\@$args{hostname} established" if $ssh->auth_ok;
already and at least "ssh" is mentioned a lot in backends. I thought of "backends" as the generic term for backends and consoles but sorry if this was misunderstood.
Updated by coolo over 6 years ago
Well, but the ssh you're having problems with is running in an xterm and is assert_screened. The failed assert_screen is the feedback
Updated by coolo over 6 years ago
From https://openqa.suse.de/tests/1677493/file/autoinst-log.txt
20564
[2018-05-09T11:59:16.0547 CEST] [debug] activate_console, console: root-console, type: console
[2018-05-09T11:59:16.0547 CEST] [debug] backend s390x || zkvm
[2018-05-09T11:59:16.0547 CEST] [debug] /var/lib/openqa/cache/tests/sle/tests/boot/boot_to_desktop.pm:27 called opensusebasetest::wait_boot
[2018-05-09T11:59:16.0548 CEST] [debug] <<< testapi::assert_screen(mustmatch='password-prompt', timeout=30)
[2018-05-09T11:59:16.0603 CEST] [debug] MATCH(password-prompt-20141126:0.00)
[2018-05-09T11:59:16.0616 CEST] [debug] MATCH(password-prompt-20150730:0.00)
[2018-05-09T11:59:16.0628 CEST] [info] Collected unknown process with pid 20564 and exit status: 0
But there is more in the journal:
May 09 11:58:24 openqaw2 worker[20496]: [info] Collected unknown process with pid 20318 and exit status: 1
May 09 11:58:24 openqaw2 worker[20496]: [info] Collected unknown process with pid 20320 and exit status: 84
May 09 11:58:24 openqaw2 worker[20496]: [info] Collected unknown process with pid 20322 and exit status: 13
May 09 11:58:24 openqaw2 worker[20496]: [info] Collected unknown process with pid 20325 and exit status: 0
May 09 11:58:24 openqaw2 worker[20496]: [info] Collected unknown process with pid 20326 and exit status: 255
May 09 11:59:16 openqaw2 worker[20496]: [info] Collected unknown process with pid 20564 and exit status: 0
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20562 and exit status: 1
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20292 and exit status: 0
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20441 and exit status: 1
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20535 and exit status: 0
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20446 and exit status: 84
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20455 and exit status: 0
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20443 and exit status: 84
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20456 and exit status: 255
May 09 11:59:48 openqaw2 worker[20496]: [info] Collected unknown process with pid 20401 and exit status: 0
Updated by coolo over 6 years ago
the 84 are xterm's exit code for IO error - so they are easy to spot :)
This happens whenever we close the xvnc console. I guess we should take care of more processes within the consoles, so we know when they die before we want them to.
Updated by okurz over 6 years ago
coolo wrote:
the 84 are xterm's exit code for IO error - so they are easy to spot :)
good to know
I guess we should take care of more processes within the consoles, so we know when they die before we want them to.
Yes, that would be great!
Updated by oorlov over 6 years ago
- Related to deleted (action #34837: [sle12][functional][u][s390x-kvm] test fails in consoletest_finish while switching from console to GUI desktop, stuck on black screen)
Updated by oorlov over 6 years ago
- Blocks action #36268: [sle][functional][u][hard][s390x] test fails in consoletest_finish - screen does not show desktop session added
Updated by mgriessmeier over 6 years ago
- Status changed from Workable to In Progress
AC's are fulfilled: https://openqa.suse.de/tests/1745198#step/boot_to_desktop/24
Updated by mgriessmeier over 6 years ago
- Status changed from In Progress to Resolved
- Assignee set to coolo
Updated by coolo over 6 years ago
- Status changed from Resolved to Workable
this is just a hotpatch after suggestion from okurz. as it seems to work well, we should upstream that
Updated by coolo over 6 years ago
- Status changed from Workable to Resolved