Project

General

Profile

Actions

action #30616

closed

[sle][functional][svirt-hyperv][u] test fails in rescuesystem - empty X window or stuck in installer loading

Added by okurz over 6 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
michalnowak
Category:
Bugs in existing tests
Start date:
2018-01-22
Due date:
2018-04-24
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-Installer-DVD-x86_64-installcheck@svirt-hyperv fails in
rescuesystem
either showing just an empty X window or stuck in installer loading.

Reproducible

Fails often but not every time

Expected result

Last good: https://openqa.suse.de/tests/1399511#step/rescuesystem/9

Further details

Always latest result in this scenario: latest


Related issues 3 (0 open3 closed)

Related to openQA Tests - action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logsResolvedokurz2018-03-08

Actions
Related to openQA Tests - action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful)Resolvedokurz2018-03-082018-05-22

Actions
Related to openQA Tests - action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not foundRejectedokurz2018-03-082018-07-03

Actions
Actions #1

Updated by okurz over 6 years ago

  • Priority changed from Normal to High

@michalnowak can you help us understand the symptoms?

Actions #2

Updated by michalnowak over 6 years ago

The black screen with a cross in the middle is Xvnc with nothing running in it. Most likely the xfreerdp quit when it lost connection to the Hyper-V VM. According to my experience FreeRDP is not particularly reliable piece of software. In snapshot Create/Revert code of os-autoinst we have mitigation to a similar problem, but that opens different wounds in FreeRDP (namely https://github.com/FreeRDP/FreeRDP/issues/3876).

Actions #3

Updated by okurz over 6 years ago

How can a process that the tests rely on just quit unnoticed in the background? Can you point me to a code section where this happens? I could not find it in neither os-autoinst nor os-autoinst-distri-opensuse.

If it's about perl I assume that there should be something to track subprocesses, e.g. http://search.cpan.org/~jrockway/AnyEvent-Subprocess-1.102912/lib/AnyEvent/Subprocess.pm . At least the error handling in case of error should be better, e.g. in a backend specific post_fail_hook handling checks for necessary processes and if they are gone raise the feedback to the test reviewer that this is the most likely cause. WDYT?

Actions #4

Updated by okurz about 6 years ago

  • Due date set to 2018-03-27
  • Target version set to Milestone 15
Actions #5

Updated by michalnowak about 6 years ago

This is how xfreerdp is executed on the intermediate server: https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/133f951a1955ff49c2833bb1ba9a2233e5074273/tests/installation/bootloader_hyperv.pm#L171

And this is why it's done in such a complicated manner: https://github.com/os-autoinst/os-autoinst/blob/1c7bb3f56f09b4ee902cd4632c7dc134cb315499/backend/svirt.pm#L169

The story being that the RDP connection is closed when snapshot is loaded, so we have to reinitiate it on our own, and that code does it "automatically" (by a loop). It works well for snapshots not so much for random connection termination. Better approaches certainly welcome.

Actions #6

Updated by okurz about 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode@svirt-hyperv
https://openqa.suse.de/tests/1408109

Actions #7

Updated by okurz about 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: mediacheck@svirt-hyperv
https://openqa.suse.de/tests/1514043

Actions #8

Updated by michalnowak about 6 years ago

https://openqa.suse.de/tests/1519013#step/firstrun/2

Nothing interesting in openQA logs, but this is on the intermediary:

openQA-hyperv-intermediary:~ # ll xfreerdp_openQA-SUT-1*
-rw-r--r-- 1 root root 558 Mar  5 10:01 xfreerdp_openQA-SUT-1.log
-rw-r--r-- 1 root root  37 Mar  5 10:01 xfreerdp_openQA-SUT-1_stop

openQA-hyperv-intermediary:~ # cat xfreerdp_openQA-SUT-1*
[10:00:34:642] [9220:9221] [INFO][com.freerdp.client.common.cmdline] - loading channelEx cliprdr
[10:00:34:643] [9220:9221] [INFO][com.freerdp.client.x11] - Property 240 does not exist
[10:00:35:136] [9220:9221] [INFO][com.freerdp.gdi] - Local framebuffer format  PIXEL_FORMAT_BGRX32
[10:00:35:136] [9220:9221] [INFO][com.freerdp.gdi] - Remote framebuffer format PIXEL_FORMAT_RGB16
[10:00:35:139] [9220:9221] [INFO][com.winpr.clipboard] - initialized POSIX local file subsystem
[10:01:28:259] [9220:9221] [INFO][com.freerdp.client.x11] - Network disconnect!
c9c29eb8-efe8-4a18-bf6b-a544ea2c6484

This is what we should see instead of the last line:

[16:21:38:946] [28619:28620] [INFO][com.freerdp.core] - ERRINFO_LOGOFF_BY_USER (0x0000000C):The disconnection was initiated by the user logging off his or her session on the server.

There are couple things we can do: (1) Update FreeRDP to HEAD, (2) execute xfreerdp with /log-level:DEBUG and save it, (3) implement xfreerdp restart on disconnect. (1) & (2) are easy, (3) is hard.

Actions #9

Updated by michalnowak about 6 years ago

  • Checklist item changed from to [x] Update FreeRDP to HEAD, [ ] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect

Updated to freerdp-2.0.0~git.1463131968.4e66df7-6.1.x86_64.

Actions #10

Updated by michalnowak about 6 years ago

  • Status changed from New to Feedback
Actions #11

Updated by michalnowak about 6 years ago

  • Checklist item changed from [x] Update FreeRDP to HEAD, [ ] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect to [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect

DEBUG is now logged. Lets gather some data if we need to implement xfreerdp reconnect feature.

Actions #13

Updated by okurz about 6 years ago

  • Related to action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logs added
Actions #14

Updated by okurz about 6 years ago

  • Related to action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful) added
Actions #15

Updated by okurz about 6 years ago

  • Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added
Actions #16

Updated by michalnowak about 6 years ago

okurz wrote:

latest failure: https://openqa.suse.de/tests/1536078#step/bootloader/2

Log says:

[22:16:26:139] [5083:5085] [INFO][com.freerdp.client.x11] - Property 240 does not exist
[22:16:26:139] [5083:5085] [DEBUG][com.freerdp.client.x11] - Searching for XInput pointer device
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.client.x11] - Pointer device: 6
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling security layer negotiation: FALSE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling restricted admin mode: FALSE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling RDP security: TRUE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling TLS security: TRUE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling NLA security: TRUE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling NLA extended security: FALSE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Security Layer Negotiation is disabled
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Sending preconnection PDU
[22:16:26:155] [5083:5085] [DEBUG][com.freerdp.core.nego] - Negotiated NLA security
[22:16:26:155] [5083:5085] [DEBUG][com.freerdp.core.nego] - nego_security_connect with PROTOCOL_NLA
[22:16:26:172] [5083:5085] [DEBUG][com.freerdp.core.nego] - Failed to connect with NLA security

Looks like a different problem, FreeRDP was unable to connect to Hyper-V at all.

Actions #17

Updated by michalnowak about 6 years ago

Two more examples in the last build:

https://openqa.suse.de/tests/1540126#step/await_install/21
https://openqa.suse.de/tests/1539988#step/select_patterns_and_packages/79

That's bad enough. I think we should revert FreeRDP to 2.0.0 RC1 :/.

Actions #18

Updated by michalnowak about 6 years ago

  • Status changed from Feedback to Workable

FreeRDP reverted to 2.0.0 RC1.

Upstream bug https://github.com/FreeRDP/FreeRDP/issues/2802 is of some interest.

+heartbeat option might help.

Actions #19

Updated by michalnowak about 6 years ago

  • Checklist item changed from [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect to [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [ ] Investigate if `+heartbeat` option helps
Actions #20

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-03-27 to 2018-04-24
Actions #21

Updated by mgriessmeier about 6 years ago

  • Subject changed from [sle][functional][svirt-hyperv]test fails in rescuesystem - empty X window or stuck in installer loading to [sle][functional][svirt-hyperv][u] test fails in rescuesystem - empty X window or stuck in installer loading
Actions #22

Updated by michalnowak about 6 years ago

  • Checklist item changed from [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [ ] Investigate if `+heartbeat` option helps to [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [x] Investigate if `+heartbeat` option helps
  • Status changed from Workable to Feedback

The situation with RDP disconnects got better right? In last couple of week I haven't seen new instances of this problem. Have anyone else?

Actions #23

Updated by okurz about 6 years ago

  • Status changed from Feedback to Resolved

I asked QSF (QA SLE functional) and we do not recall seeing these symptoms recently. I guess we are done here.

You might also be interested in the plan of QSF to improve the logging and error investigation on remote connections, e.g. in tickets #33202 about better logging output in os-autoinst and #33202 with the acceptance criterion "No black screen without pop-up with a hint what went wrong, what is running and what we actually see".

So thanks so far.

Actions #24

Updated by okurz almost 6 years ago

  • Checklist item changed from [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [x] Investigate if `+heartbeat` option helps to

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: textmode@svirt-hyperv
https://openqa.suse.de/tests/1683141

Actions

Also available in: Atom PDF