action #60539
closedopenQA remote-back-ends crashes with Xvnc >= 1.9.80
Description
While implementing the Power-HMC-back-end I realized that my os-autoinst crashes 100% reproducible at exactly one point in the test.
"bisecting" revealed that not os-autoinst per se is to blame but rather one of its dependencies. Digging further showed that this issue also happens on Leap15.1 (our current OSD worker distro) but way less severe. Therefore I setup a Leap15.1 docker container as worker and tried it. Turns out; the worker can continue: http://openqa.glados.qa.suse.de/tests/95 (Fail is due to a missing needle, unrelated). However, if I upgrade Xvnc inside the container from the Tumbleweed repos, I can again perfectly reproduce the issue: http://openqa.glados.qa.suse.de/tests/94.
Xvnc version from Leap15.1 according to Xvnc -version
: Xvnc TigerVNC 1.9.0 - built ??? ?? ???? ??:??:??
Xvnc version from Leap15.1 according to zypper info xorg-x11-Xvnc
: Version : 1.9.0-lp151.3.2
Xvnc version from TW according to Xvnc -version
: Xvnc TigerVNC 1.9.80 - built ??? ?? ???? ??:??:??
Xvnc version from TW according to zypper info xorg-x11-Xvnc
: Version : 1.9.0-8.2
Updated by okurz about 5 years ago
as discussed in weekly:
- use known workaround to be able to continue with HMC backend work
- would be good to have integration tests covering Xvnc in travis CI as well or OBS
- better feedback for user if Xvnc crashes or does not react as expected
Updated by mkittler about 5 years ago
- Assignee set to mkittler
I'm assigning because I'd like to investigate this a little bit more as it affects developing backends relying on Xvnc under Tumbleweed.
Updated by mkittler about 5 years ago
- Related to action #39503: svirt tests fail with unsupported update encoding -1733194013 at /.../consoles/VNC.pm line 988 added
Updated by mkittler about 5 years ago
This appears to be a problem within our VNC client code (VNC.pm
). When connecting to Xnvc with vncviewer
the connection is not interrupted. (To be able to connect via vncviewer
one has to remove the socket/pipe/-inetd code in localXvnc.pm
.)
I added #39503 as related ticket because @coolo mentioned it and it seems similar, indeed.
The particular problem seems to be the first $socket->read(...)
in the foreach (1 .. $number_of_rectangles) {
-loop in VNC.pm
. When making it reconnect at this point like
my $read = $socket->read(my $data, 12);
if (!defined $read) {
bmwqemu::diag 'unexpected end of data, trying to reconnect VNC';
$self->socket->close;
$self->socket(undef);
return $self->login;
}
the test can continue. Then unexpected end of data, trying to reconnect VNC
is logged quite frequently.
Note that I also observed the following message (likely from Xvnc
):
VNCSConnST: closing 127.0.0.1::56126: SMsgWriter::writeFramebufferUpdateEnd:
nRects out of sync
EncodeManager: Framebuffer updates: 4
EncodeManager: ZRLE:
EncodeManager: Solid: 8 rects, 393.408 kpixels
EncodeManager: 196 B (1:4014.86 ratio)
EncodeManager: Bitmap RLE: 18 rects, 855.2 kpixels
EncodeManager: 6.94922 KiB (1:240.39 ratio)
EncodeManager: Total: 26 rects, 1.24861 Mpixels
EncodeManager: 7.14062 KiB (1:341.566 ratio)
Connections: closed: 127.0.0.1::56126
ComparingUpdateTracker: 443.44 kpixels in / 118.832 kpixels out
ComparingUpdateTracker: (1:3.73165 ratio)
(To be able to see the Xvnc
log one has to remove the socket/pipe/-inetd code in localXvnc.pm
.)
So there's something wrong with transferring the number of rectangles. Since vncviewer
has not problems I assume our VNC.pm
client has a bug.
Updated by mkittler about 5 years ago
The log
VNCSConnST: closing 127.0.0.1::56126: SMsgWriter::writeFramebufferUpdateEnd:
nRects out of sync
...
only appears when connecting with vncviewer
at the same time and therefore seems unrelated. The message is only about a TightVNC-specific pseudo encoding (-224). If that was the problem we would see unsupported encoding -224
in the log.
Updated by mkittler about 5 years ago
Apparently the log message from Xvnc
appears also without an additional vncviewer
client. Not sure why the log message is sometimes visible and sometimes not. But it is logged before the disconnect of VNC.pm
. Here's how the disconnect (and reconnect with my workaround) looks like in the log:
Tue Dec 3 18:24:19 2019
VNCSConnST: closing 127.0.0.1::35822: SMsgWriter::writeFramebufferUpdateEnd:
nRects out of sync
EncodeManager: Framebuffer updates: 169
EncodeManager: ZRLE:
EncodeManager: Solid: 123 rects, 962.868 kpixels
EncodeManager: 2.9043 KiB (1:648.02 ratio)
EncodeManager: Bitmap RLE: 226 rects, 544.684 kpixels
EncodeManager: 14.7705 KiB (1:72.2036 ratio)
EncodeManager: Indexed RLE: 7 rects, 200.878 kpixels
EncodeManager: 3.78516 KiB (1:103.674 ratio)
EncodeManager: Full Colour: 2 rects, 900 pixels
EncodeManager: 268 B (1:6.80597 ratio)
EncodeManager: Total: 358 rects, 1.70933 Mpixels
EncodeManager: 21.7217 KiB (1:153.889 ratio)
Connections: closed: 127.0.0.1::35822
ComparingUpdateTracker: 1.24823 Mpixels in / 400.356 kpixels out
ComparingUpdateTracker: (1:3.11781 ratio)
[2019-12-03T18:24:19.823 CET] [debug] NOR 46
[2019-12-03T18:24:19.836 CET] [debug] unexpected end of data when reading rectangle 28 (of 46), trying to reconnect VNC
Connections: accepted: 127.0.0.1::35930
SConnection: Client needs protocol version 3.8
SConnection: Client requests security type None(1)
VNCSConnST: Server default pixel format depth 16 (16bpp) little-endian rgb565
VNCSConnST: Client pixel format depth 16 (16bpp) little-endian rgb565
That this message is logged seems like a logic bug in the Xvnc
code. So maybe it is Xvnc
after all and vncviewer
is just more error tolerant.
Updated by coolo about 5 years ago
I can reproduce this also with 1.10
[2019-12-04T09:23:19.608 CET] [debug] NOR 9
unexpected end of data at consoles/VNC.pm line 934.
and on the other side
XserverDesktop: client gone, sock 10
VNCSConnST: closing 127.0.0.1::44010: SMsgWriter::writeFramebufferUpdateEnd:
nRects out of sync
Updated by mkittler about 5 years ago
By the way, how did you reproduce this? I'm just restarting an spvm job (still using grenache-1:6
's LPAR) over and over again.
I guess in my runs when the Xvnc log wasn't visible I just had a leftover Xvnc still running in the background which prevented a new one to start (hard-coded the port for my testing) and the leftover Xvnc was used.
Maybe I can produce a debug build of Xvnc to see why it is ending up throwing that exception.
Updated by coolo about 5 years ago
So the server disconnects us because it has an internal exception - how we trigger this is pure guesswork. Reconnecting seems to be the only workaround possible - unless we want to reverse engineer Xvnc bugs :(
Updated by coolo about 5 years ago
One rather easy work around is supporting last-rect pseudo encoding - avoiding the bug in tighervnc. Reported upstream as https://github.com/TigerVNC/tigervnc/issues/914
Updated by mkittler about 5 years ago
Updated by mkittler about 5 years ago
- Status changed from New to In Progress
- Target version set to Current Sprint
Updated by okurz about 5 years ago
- Related to action #28186: VNC: Use of uninitialized value $encoding_type in pack added
Updated by mkittler about 5 years ago
- Status changed from In Progress to Resolved
- Target version changed from Current Sprint to Done
Reference to TigerVNC issue: https://github.com/TigerVNC/tigervnc/issues/914
The PR has been merged. The issue wasn't very apparent in production so I'm marking the issue directly as resolved.
Updated by okurz about 5 years ago
- Copied to action #60758: better feedback for Xvnc disconnecting clients, integration tests covering Xvnc in travis CI as well or OBS added