action #60539

openQA remote-back-ends crashes with Xvnc >= 1.9.80

Added by nicksinger 3 months ago. Updated 3 months ago.

Status:ResolvedStart date:03/12/2019
Priority:NormalDue date:
Assignee:mkittler% Done:

0%

Category:Concrete Bugs
Target version:Done
Difficulty:
Duration:

Description

While implementing the Power-HMC-back-end I realized that my os-autoinst crashes 100% reproducible at exactly one point in the test.
"bisecting" revealed that not os-autoinst per se is to blame but rather one of its dependencies. Digging further showed that this issue also happens on Leap15.1 (our current OSD worker distro) but way less severe. Therefore I setup a Leap15.1 docker container as worker and tried it. Turns out; the worker can continue: http://openqa.glados.qa.suse.de/tests/95 (Fail is due to a missing needle, unrelated). However, if I upgrade Xvnc inside the container from the Tumbleweed repos, I can again perfectly reproduce the issue: http://openqa.glados.qa.suse.de/tests/94.

Xvnc version from Leap15.1 according to Xvnc -version: Xvnc TigerVNC 1.9.0 - built ??? ?? ???? ??:??:??
Xvnc version from Leap15.1 according to zypper info xorg-x11-Xvnc: Version : 1.9.0-lp151.3.2

Xvnc version from TW according to Xvnc -version: Xvnc TigerVNC 1.9.80 - built ??? ?? ???? ??:??:??
Xvnc version from TW according to zypper info xorg-x11-Xvnc: Version : 1.9.0-8.2


Related issues

Related to openQA Project - action #39503: svirt tests fail with unsupported update encoding -173319... Resolved 09/08/2018
Related to openQA Project - action #28186: VNC: Use of uninitialized value $encoding_type in pack New 22/11/2017
Copied to openQA Project - action #60758: better feedback for Xvnc disconnecting clients, integrati... New 03/12/2019

History

#1 Updated by okurz 3 months ago

as discussed in weekly:

  1. use known workaround to be able to continue with HMC backend work
  2. would be good to have integration tests covering Xvnc in travis CI as well or OBS
  3. better feedback for user if Xvnc crashes or does not react as expected

#2 Updated by mkittler 3 months ago

  • Assignee set to mkittler

I'm assigning because I'd like to investigate this a little bit more as it affects developing backends relying on Xvnc under Tumbleweed.

#3 Updated by mkittler 3 months ago

The problem persists with Xvnc 1.10.0.

#4 Updated by mkittler 3 months ago

  • Related to action #39503: svirt tests fail with unsupported update encoding -1733194013 at /.../consoles/VNC.pm line 988 added

#5 Updated by mkittler 3 months ago

This appears to be a problem within our VNC client code (VNC.pm). When connecting to Xnvc with vncviewer the connection is not interrupted. (To be able to connect via vncviewer one has to remove the socket/pipe/-inetd code in localXvnc.pm.)

I added #39503 as related ticket because @coolo mentioned it and it seems similar, indeed.

The particular problem seems to be the first $socket->read(...) in the foreach (1 .. $number_of_rectangles) {-loop in VNC.pm. When making it reconnect at this point like

        my $read = $socket->read(my $data, 12);
        if (!defined $read) {
            bmwqemu::diag 'unexpected end of data, trying to reconnect VNC';
            $self->socket->close;
            $self->socket(undef);
            return $self->login;
        }

the test can continue. Then unexpected end of data, trying to reconnect VNC is logged quite frequently.

Note that I also observed the following message (likely from Xvnc):

 VNCSConnST:  closing 127.0.0.1::56126: SMsgWriter::writeFramebufferUpdateEnd:
              nRects out of sync
 EncodeManager: Framebuffer updates: 4
 EncodeManager:   ZRLE:
 EncodeManager:     Solid: 8 rects, 393.408 kpixels
 EncodeManager:            196 B (1:4014.86 ratio)
 EncodeManager:     Bitmap RLE: 18 rects, 855.2 kpixels
 EncodeManager:                 6.94922 KiB (1:240.39 ratio)
 EncodeManager:   Total: 26 rects, 1.24861 Mpixels
 EncodeManager:          7.14062 KiB (1:341.566 ratio)
 Connections: closed: 127.0.0.1::56126
 ComparingUpdateTracker: 443.44 kpixels in / 118.832 kpixels out
 ComparingUpdateTracker: (1:3.73165 ratio)

(To be able to see the Xvnc log one has to remove the socket/pipe/-inetd code in localXvnc.pm.)

So there's something wrong with transferring the number of rectangles. Since vncviewer has not problems I assume our VNC.pm client has a bug.

#6 Updated by mkittler 3 months ago

The log

 VNCSConnST:  closing 127.0.0.1::56126: SMsgWriter::writeFramebufferUpdateEnd:
              nRects out of sync
...

only appears when connecting with vncviewer at the same time and therefore seems unrelated. The message is only about a TightVNC-specific pseudo encoding (-224). If that was the problem we would see unsupported encoding -224 in the log.

#7 Updated by mkittler 3 months ago

Apparently the log message from Xvnc appears also without an additional vncviewer client. Not sure why the log message is sometimes visible and sometimes not. But it is logged before the disconnect of VNC.pm. Here's how the disconnect (and reconnect with my workaround) looks like in the log:

Tue Dec  3 18:24:19 2019
 VNCSConnST:  closing 127.0.0.1::35822: SMsgWriter::writeFramebufferUpdateEnd:
              nRects out of sync
 EncodeManager: Framebuffer updates: 169
 EncodeManager:   ZRLE:
 EncodeManager:     Solid: 123 rects, 962.868 kpixels
 EncodeManager:            2.9043 KiB (1:648.02 ratio)
 EncodeManager:     Bitmap RLE: 226 rects, 544.684 kpixels
 EncodeManager:                 14.7705 KiB (1:72.2036 ratio)
 EncodeManager:     Indexed RLE: 7 rects, 200.878 kpixels
 EncodeManager:                  3.78516 KiB (1:103.674 ratio)
 EncodeManager:     Full Colour: 2 rects, 900 pixels
 EncodeManager:                  268 B (1:6.80597 ratio)
 EncodeManager:   Total: 358 rects, 1.70933 Mpixels
 EncodeManager:          21.7217 KiB (1:153.889 ratio)
 Connections: closed: 127.0.0.1::35822
 ComparingUpdateTracker: 1.24823 Mpixels in / 400.356 kpixels out
 ComparingUpdateTracker: (1:3.11781 ratio)
[2019-12-03T18:24:19.823 CET] [debug] NOR 46
[2019-12-03T18:24:19.836 CET] [debug] unexpected end of data when reading rectangle 28 (of 46), trying to reconnect VNC
 Connections: accepted: 127.0.0.1::35930
 SConnection: Client needs protocol version 3.8
 SConnection: Client requests security type None(1)
 VNCSConnST:  Server default pixel format depth 16 (16bpp) little-endian rgb565
 VNCSConnST:  Client pixel format depth 16 (16bpp) little-endian rgb565

That this message is logged seems like a logic bug in the Xvnc code. So maybe it is Xvnc after all and vncviewer is just more error tolerant.

#8 Updated by coolo 3 months ago

I can reproduce this also with 1.10

[2019-12-04T09:23:19.608 CET] [debug] NOR 9
unexpected end of data at consoles/VNC.pm line 934.

and on the other side

XserverDesktop: client gone, sock 10
VNCSConnST: closing 127.0.0.1::44010: SMsgWriter::writeFramebufferUpdateEnd:
nRects out of sync

#9 Updated by mkittler 3 months ago

By the way, how did you reproduce this? I'm just restarting an spvm job (still using grenache-1:6's LPAR) over and over again.

I guess in my runs when the Xvnc log wasn't visible I just had a leftover Xvnc still running in the background which prevented a new one to start (hard-coded the port for my testing) and the leftover Xvnc was used.

Maybe I can produce a debug build of Xvnc to see why it is ending up throwing that exception.

#10 Updated by coolo 3 months ago

So the server disconnects us because it has an internal exception - how we trigger this is pure guesswork. Reconnecting seems to be the only workaround possible - unless we want to reverse engineer Xvnc bugs :(

#11 Updated by coolo 3 months ago

One rather easy work around is supporting last-rect pseudo encoding - avoiding the bug in tighervnc. Reported upstream as https://github.com/TigerVNC/tigervnc/issues/914

#13 Updated by mkittler 3 months ago

  • Status changed from New to In Progress
  • Target version set to Current Sprint

#14 Updated by okurz 3 months ago

  • Related to action #28186: VNC: Use of uninitialized value $encoding_type in pack added

#15 Updated by mkittler 3 months ago

  • Status changed from In Progress to Resolved
  • Target version changed from Current Sprint to Done

Reference to TigerVNC issue: https://github.com/TigerVNC/tigervnc/issues/914


The PR has been merged. The issue wasn't very apparent in production so I'm marking the issue directly as resolved.

#16 Updated by okurz 3 months ago

  • Copied to action #60758: better feedback for Xvnc disconnecting clients, integration tests covering Xvnc in travis CI as well or OBS added

Also available in: Atom PDF