action #30616
closed[sle][functional][svirt-hyperv][u] test fails in rescuesystem - empty X window or stuck in installer loading
Added by okurz about 7 years ago. Updated over 6 years ago.
0%
Description
Observation¶
openQA test in scenario sle-15-Installer-DVD-x86_64-installcheck@svirt-hyperv fails in
rescuesystem
either showing just an empty X window or stuck in installer loading.
Reproducible¶
Fails often but not every time
Expected result¶
Last good: https://openqa.suse.de/tests/1399511#step/rescuesystem/9
Further details¶
Always latest result in this scenario: latest
Updated by okurz about 7 years ago
- Priority changed from Normal to High
@michalnowak can you help us understand the symptoms?
Updated by michalnowak about 7 years ago
The black screen with a cross in the middle is Xvnc
with nothing running in it. Most likely the xfreerdp
quit when it lost connection to the Hyper-V VM. According to my experience FreeRDP is not particularly reliable piece of software. In snapshot Create/Revert code of os-autoinst we have mitigation to a similar problem, but that opens different wounds in FreeRDP (namely https://github.com/FreeRDP/FreeRDP/issues/3876).
Updated by okurz about 7 years ago
How can a process that the tests rely on just quit unnoticed in the background? Can you point me to a code section where this happens? I could not find it in neither os-autoinst nor os-autoinst-distri-opensuse.
If it's about perl I assume that there should be something to track subprocesses, e.g. http://search.cpan.org/~jrockway/AnyEvent-Subprocess-1.102912/lib/AnyEvent/Subprocess.pm . At least the error handling in case of error should be better, e.g. in a backend specific post_fail_hook handling checks for necessary processes and if they are gone raise the feedback to the test reviewer that this is the most likely cause. WDYT?
Updated by okurz about 7 years ago
- Due date set to 2018-03-27
- Target version set to Milestone 15
Updated by michalnowak about 7 years ago
This is how xfreerdp
is executed on the intermediate server: https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/133f951a1955ff49c2833bb1ba9a2233e5074273/tests/installation/bootloader_hyperv.pm#L171
And this is why it's done in such a complicated manner: https://github.com/os-autoinst/os-autoinst/blob/1c7bb3f56f09b4ee902cd4632c7dc134cb315499/backend/svirt.pm#L169
The story being that the RDP connection is closed when snapshot is loaded, so we have to reinitiate it on our own, and that code does it "automatically" (by a loop). It works well for snapshots not so much for random connection termination. Better approaches certainly welcome.
Updated by okurz almost 7 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode@svirt-hyperv
https://openqa.suse.de/tests/1408109
Updated by okurz almost 7 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: mediacheck@svirt-hyperv
https://openqa.suse.de/tests/1514043
Updated by michalnowak almost 7 years ago
https://openqa.suse.de/tests/1519013#step/firstrun/2
Nothing interesting in openQA logs, but this is on the intermediary:
openQA-hyperv-intermediary:~ # ll xfreerdp_openQA-SUT-1*
-rw-r--r-- 1 root root 558 Mar 5 10:01 xfreerdp_openQA-SUT-1.log
-rw-r--r-- 1 root root 37 Mar 5 10:01 xfreerdp_openQA-SUT-1_stop
openQA-hyperv-intermediary:~ # cat xfreerdp_openQA-SUT-1*
[10:00:34:642] [9220:9221] [INFO][com.freerdp.client.common.cmdline] - loading channelEx cliprdr
[10:00:34:643] [9220:9221] [INFO][com.freerdp.client.x11] - Property 240 does not exist
[10:00:35:136] [9220:9221] [INFO][com.freerdp.gdi] - Local framebuffer format PIXEL_FORMAT_BGRX32
[10:00:35:136] [9220:9221] [INFO][com.freerdp.gdi] - Remote framebuffer format PIXEL_FORMAT_RGB16
[10:00:35:139] [9220:9221] [INFO][com.winpr.clipboard] - initialized POSIX local file subsystem
[10:01:28:259] [9220:9221] [INFO][com.freerdp.client.x11] - Network disconnect!
c9c29eb8-efe8-4a18-bf6b-a544ea2c6484
This is what we should see instead of the last line:
[16:21:38:946] [28619:28620] [INFO][com.freerdp.core] - ERRINFO_LOGOFF_BY_USER (0x0000000C):The disconnection was initiated by the user logging off his or her session on the server.
There are couple things we can do: (1) Update FreeRDP to HEAD
, (2) execute xfreerdp
with /log-level:DEBUG
and save it, (3) implement xfreerdp
restart on disconnect. (1) & (2) are easy, (3) is hard.
Updated by michalnowak almost 7 years ago
- Checklist item changed from to [x] Update FreeRDP to HEAD, [ ] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect
Updated to freerdp-2.0.0~git.1463131968.4e66df7-6.1.x86_64
.
Updated by michalnowak almost 7 years ago
- Status changed from New to Feedback
Updated by michalnowak almost 7 years ago
- Checklist item changed from [x] Update FreeRDP to HEAD, [ ] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect to [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect
DEBUG is now logged. Lets gather some data if we need to implement xfreerdp
reconnect feature.
Updated by okurz almost 7 years ago
latest failure: https://openqa.suse.de/tests/1536078#step/bootloader/2
Updated by okurz almost 7 years ago
- Related to action #32932: [sle][functional][u][hyperv] test fails in logs_from_installation_system - Increase timeout for uploading logs added
Updated by okurz almost 7 years ago
- Related to action #32926: [sle][functional][y][hyperv][medium] avoid typing username before switched tty (was: test fails in yast2_i - (mising needles?, rather too low timeout for hyperv) for Installation Report succesful) added
Updated by okurz almost 7 years ago
- Related to action #32929: [sle][functional][u][hyperv] test fails in postgresql_server - SubState=running not found added
Updated by michalnowak almost 7 years ago
okurz wrote:
latest failure: https://openqa.suse.de/tests/1536078#step/bootloader/2
Log says:
[22:16:26:139] [5083:5085] [INFO][com.freerdp.client.x11] - Property 240 does not exist
[22:16:26:139] [5083:5085] [DEBUG][com.freerdp.client.x11] - Searching for XInput pointer device
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.client.x11] - Pointer device: 6
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling security layer negotiation: FALSE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling restricted admin mode: FALSE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling RDP security: TRUE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling TLS security: TRUE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling NLA security: TRUE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Enabling NLA extended security: FALSE
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Security Layer Negotiation is disabled
[22:16:26:140] [5083:5085] [DEBUG][com.freerdp.core.nego] - Sending preconnection PDU
[22:16:26:155] [5083:5085] [DEBUG][com.freerdp.core.nego] - Negotiated NLA security
[22:16:26:155] [5083:5085] [DEBUG][com.freerdp.core.nego] - nego_security_connect with PROTOCOL_NLA
[22:16:26:172] [5083:5085] [DEBUG][com.freerdp.core.nego] - Failed to connect with NLA security
Looks like a different problem, FreeRDP was unable to connect to Hyper-V at all.
Updated by michalnowak almost 7 years ago
Two more examples in the last build:
https://openqa.suse.de/tests/1540126#step/await_install/21
https://openqa.suse.de/tests/1539988#step/select_patterns_and_packages/79
That's bad enough. I think we should revert FreeRDP to 2.0.0 RC1 :/.
Updated by michalnowak almost 7 years ago
- Status changed from Feedback to Workable
FreeRDP reverted to 2.0.0 RC1.
Upstream bug https://github.com/FreeRDP/FreeRDP/issues/2802 is of some interest.
+heartbeat
option might help.
Updated by michalnowak almost 7 years ago
- Checklist item changed from [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect to [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [ ] Investigate if `+heartbeat` option helps
Updated by mgriessmeier almost 7 years ago
- Due date changed from 2018-03-27 to 2018-04-24
Updated by mgriessmeier almost 7 years ago
- Subject changed from [sle][functional][svirt-hyperv]test fails in rescuesystem - empty X window or stuck in installer loading to [sle][functional][svirt-hyperv][u] test fails in rescuesystem - empty X window or stuck in installer loading
Updated by michalnowak almost 7 years ago
- Checklist item changed from [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [ ] Investigate if `+heartbeat` option helps to [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [x] Investigate if `+heartbeat` option helps
- Status changed from Workable to Feedback
The situation with RDP disconnects got better right? In last couple of week I haven't seen new instances of this problem. Have anyone else?
Updated by okurz almost 7 years ago
- Status changed from Feedback to Resolved
I asked QSF (QA SLE functional) and we do not recall seeing these symptoms recently. I guess we are done here.
You might also be interested in the plan of QSF to improve the logging and error investigation on remote connections, e.g. in tickets #33202 about better logging output in os-autoinst and #33202 with the acceptance criterion "No black screen without pop-up with a hint what went wrong, what is running and what we actually see".
So thanks so far.
Updated by okurz over 6 years ago
- Checklist item changed from [x] Update FreeRDP to HEAD, [x] execute xfreerdp with /log-level:DEBUG and save it, [ ] implement xfreerdp restart on disconnect, [x] Investigate if `+heartbeat` option helps to
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: textmode@svirt-hyperv
https://openqa.suse.de/tests/1683141