action #42362

[qam][virtio][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland"

Added by zcjia 12 months ago. Updated 3 months ago.

Status:In ProgressStart date:12/10/2018
Priority:NormalDue date:
Assignee:zcjia% Done:

0%

Category:Bugs in existing tests
Target version:-
Difficulty:
Duration:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-x86_64-desktopapps-remote-client1@64bit-virtio-vga fails in
window_system

There are many failures in "window_system" here: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=66.2&groupid=118

These failures can go away if you restart the jobs a few times, but it should get proper fix.

The cause is that "script_output" calls "type_string", which types many characters too fast into gnome, causing some characters missing or repeated.

Similar issues happens on PPC64 platform, so there is "VNC_TYPING_LIMIT=10" in ppc64 worker.

I tested on OSD, this setting works fine for gnome.

(This setting means how many keys are typed into VNC in one second, default value is 50.)

In other tests, there are also typing failures from time to time.

@Oliver:

So I wonder if this setting should be added to the sle-15-desktop medium type?

Reproducible

Fails since (at least) Build 66.2 (current job)

Expected result

Last good: 63.1 (or more recent)

Further details

Always latest result in this scenario: latest

possible misstype ninjakeys

gnome_wayland_qxl2.png (87.8 KB) zgao, 04/12/2018 08:50 am

gnome_wayland_qxl.png (156 KB) zgao, 04/12/2018 08:50 am

gnome_wayland_virtio.png (143 KB) zgao, 04/12/2018 08:50 am

gnome_wayland_virtio2.png (74.5 KB) zgao, 04/12/2018 08:50 am

gnome_x11_qxl.png (178 KB) zgao, 04/12/2018 08:50 am

gnome_x11_virtio.png (189 KB) zgao, 04/12/2018 08:50 am

kde_wayland_virtio.png (97.7 KB) zgao, 04/12/2018 08:50 am

kde_wayland_virtio2.png (72.2 KB) zgao, 04/12/2018 08:50 am

kde_x11_qxl.png (137 KB) zgao, 04/12/2018 08:50 am

kde_wayland_qxl.png (156 KB) zgao, 04/12/2018 08:50 am

qemu_vnc_keypress.jpg (147 KB) zgao, 14/12/2018 11:38 am

7244
7247
7250
7253
7256
7259
7262
7265
7268
7271
7334

Related issues

Related to openQA Tests - action #41681: [desktop][sporadic][virtio] test fails in window_system t... New 27/09/2018
Related to openQA Tests - action #53339: [functional][u] test fails in swing due to incorrect rend... Resolved 19/06/2019
Duplicated by openQA Tests - action #51332: [sle][functional][u] test fails in firefox - wrong html p... Rejected 09/05/2019
Blocks openQA Tests - action #43889: [functional][u] test fails in ooffice - openQA makes spel... Blocked 16/11/2018

History

#1 Updated by okurz 11 months ago

zcjia wrote:

@Oliver:

So I wonder if this setting should be added to the sle-15-desktop medium type?

I doubt this is a good idea. Also I do not think that "we type too fast". I guess the system is just loaded and a bit unresponsive just after startup. One can consider this a product bug.

I would suggest an approach similar to
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5365
which ensures with the desktop_runner module that there is a certain "cool down time" directly after login. I suggest to involve jbaier as the test module maintainer of window_system to handle this.

#2 Updated by osukup 11 months ago

  • Subject changed from [sle15sp1][desktop] test fails in window_system because "typing string is too fast in gnome" to [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in gnome"

same on qam jobs .. time to time this fails with typing issues

#3 Updated by zcjia 11 months ago

@Oliver, I can confirm that the system load is not high when this error happens. Also, this error can happen elsewhere every now and then, which is very annoying.

I talked to another QA, his opinion and experience is that openQA is typing too fast: 50 key strokes a second is indeed too fast, especially for a desktop environment.

@jbaier, what's your opinion on this matter?

#4 Updated by jbaier_cz 11 months ago

For me it is a little bit weird. It was working quite confidently at the time of the development. Now I see those errors more and more often. Of course, I can rewrite the test (at the cost of loosing the record_info feature for pretty presentation), however this issue could come back in other tests as there are a few tests which types in the terminal.

It would be nice to know the root of the issue (load on the worker / openQA backend issue / unreliable typing short after boot).

#5 Updated by zcjia 11 months ago

  • Status changed from New to In Progress

There are failures in palaces other than "window_system", for example in "firefox_smoke":

https://openqa.suse.de/tests/2216384#next_previous

And I have an important discovery: all these "typing too fast" failures happen in wayland instead of X11.

I'll try to dig deeper.

#6 Updated by zcjia 11 months ago

  • Assignee set to zcjia

#7 Updated by dzedro 11 months ago

I think problem is in broken key presses with QEMUVGA=std or QEMUGA=virtio

#8 Updated by zcjia 11 months ago

The reason to use QEMUVGA=virtio for wayland is that wayland doesn't support the default "cirrus", see https://progress.opensuse.org/issues/21786

#9 Updated by zcjia 11 months ago

  • Subject changed from [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in gnome" to [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland"

I tested X11+virtio, there's no problem; so the problem happens in wayland+virtio.

I'll try to reproduce this problem outside of openQA.

#10 Updated by yfjiang 10 months ago

Hi @zcjia, @okurz, I saw an upstream bug of qemu was reported https://bugs.launchpad.net/qemu/+bug/1802465 , it looks X is also affected but wayland is more easy to be impacted with a shorter length of trigger string. The problem is we are not sure how the issue will be escalated yet.

By the fact that there are quite some desktop testing of SLE-15-SP1 desktop were blocked by this issue, and Alpha-3 is coming:

http://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=88.6&groupid=118

Is there a way to add an acceptable workaround to make the testing run more reliably? Thank you!

#11 Updated by zcjia 10 months ago

Since SLE-15-SP1-Alpha3 is coming, I will apply the workaround on OSD:

by adding "VNC_TYPING_LIMIT=10" to machine type "64bit-virtio-vga".

Let's see how it goes.

#12 Updated by okurz 10 months ago

Yes, I think this is a good idea. As long as you call the ticket only done when you removed that workaround again I am fine ;)

#13 Updated by okurz 10 months ago

  • Related to action #41681: [desktop][sporadic][virtio] test fails in window_system to open windows or type correctly - related to wayland/virtio? added

#14 Updated by szarate 10 months ago

  • Description updated (diff)

So, looking at this... looks to me again to missing keys/ninjakeys: https://openqa.suse.de/tests/2167333#step/window_system/4 if you look closely... the N is never let go and looking at the latest job: https://openqa.suse.de/tests/2277128 there's a poem missing there... and the same for other jobs in the build...

#15 Updated by szarate 10 months ago

  • Related to action #43889: [functional][u] test fails in ooffice - openQA makes spelling mistakes added

#16 Updated by okurz 10 months ago

  • Subject changed from [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland" to [qam][virtio][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland"

yes, but in this example again it's specific to virtio, see #41681

#17 Updated by zgao 10 months ago

I ran several tests with respect to
- qxl or virtio
- x11 or wayland
- gnome or kde

It seems to me that:

  • qxl seems to mitigate the problem, if you could compare gnome_wayland_qxl.png to gnome_wayland_virtio.png in the attachment

  • x11 has much fewer typing misses but once in a while it still happens

  • A special case ABCDE causes high miss rate on shift key(around 50%) for both x11 and wayland, please refer to *2.png in the attachment

Above is my option and it's merely based on my observation. I ran around 1000 characters for each test. Qemu version based on stable-2.12 and queue_count enlarged to 102400.

What you do think? What other things can we tweak?

#18 Updated by okurz 10 months ago

Good evaluation. I wonder why you selected libreoffice though. IMHO libreoffice is especially prone to cause mistyping, see #43889, maybe because of "spell correction" or something. The original ticket observation was about "window_system" so maybe the results would actually be different if you would conduct the experiment in a more "simple" environment, e.g. gedit/kwrite

#19 Updated by zgao 9 months ago

Thank you!
I actually tested on both gedit and libreoffice. I will proceed with gedit here after.

I found a special case about sending key streams ending with a uppercase letter, e.g. ABCDE, AbC, aaaC

If you could refer to gnome_wayland_virtio2.png, gnome_wayland_qxl2.png or kde_wayland_virtio2.png in the above attachment, you could found there's a obvious miss on lowercase or uppercase every two times.

The reason is as follows, when we send one character A, the key code sent is actually like
1. capslock pressed
2. a pressed
4. enter pressed
So capslock is always used instead of shift and now capslock gets toggled.

If a lowercase letter comes after a capital letter, the capslock would be pressed again so capslock will get back to normal.
But if key stream ends with a uppercase letter, it is observed that capslock is not pressed again and therefore toggled.

Now I'm wondering which part of code actually deals with capslock, is it ps2 driver handling it or vncdotool?
What do you think?

#20 Updated by okurz 9 months ago

Are you sure it's "caps lock" or just "shift"? I guess that it is either qemu or VNC, I doubt that vncdotool is involved here at all. https://build.opensuse.org/package/view_file/Virtualization/qemu/0025-Fix-tigervnc-long-press-issue.patch?expand=1 might be related? Just browsed the web quickly myself, maybe https://www.berrange.com/posts/2010/07/04/more-than-you-or-i-ever-wanted-to-know-about-virtual-keyboard-handling/ helps. Sorry, can't help more.

#21 Updated by zgao 9 months ago

Thank you for the comment!!

An update about this issue,

I checked on guest OS regarding what values kernel receive, and it turns out to be wrong.

  • First on guest OS, I found the keyboard device at /dev/input/eventX, X might subject to be any number.
  • Then I monitor what this device -- eventX receives by evtest /dev/input/eventX

According to my observation, there are key misses at /dev/input/eventX. But I have not yet find any key wrongly ordered. A key miss might cause a lot more errors on x11 or wayland.

Additionally, qemu has vnc layer, input event queue layer, ps2 key queue layer. I printed values in the queues of these lays --- and found out they are all accurate.

So I presume it is a problem about qemu interfacing with guest kernel.
I attached a graph how qemu deals with key inputs, it's made by me and hope it helps.

#22 Updated by okurz 9 months ago

wow, I appreciate your fine detailed investigation notes, especially the picture can help understanding, thx.

#23 Updated by zgao 9 months ago

I found the bug resides both in qemu internals and in wayland.

In qemu, it is a concurrency problem since capslock LED is handled by another thread --- so keycode streams get messed up when capslock LED is emitted.

I opened up a bug on qemu upstream

The bug regarding wayland is observed when kernel event device receives correct keycode streams, gedit still shows wrong values. It seems to be relevant to bug 1117833: Wayland: briefffffffffff unresponsivvvvvvvvvvvveness and repeated keys This bug is not observed on x11.

To check if kernel is to blame or not, I used ftrace and dynamic_debug to peek into what kernel receives. It seems kernel receives what qemu sends it and passes all recognized key streams to /dev/input/event.

#24 Updated by zgao 9 months ago

Found related bug report related to fedora.

Bug 1658676 - Typing '<' more than once (with the specific keypress sequence used by os-autoinst) breaks with usb-kbd and VNC

Also,
'Less than' (<), 'more than' (>), and 'pipe' (|) can't be typed via VNC

Additionally, I will pay some attention to wayland bug, since it happens much more often comparably.

#25 Updated by zgao 8 months ago

Issue raised on gnome-shell gitlab
As pointed out by zcjia, wayland typing issue only happens after gnome-shell 3.2.6.

Noted that this wayland issue is to be blame for severe typing flaws on wayland under gnome.

But typing issues on openQA involves qemu issues, and possibly other parts ---we found that create_hdd_* cases might have user name typed wrong, but I failed to reproduce this problem in installed x11 instance where I tested around 250,000 characters through vnc 10ms per char.

You could refer to gnome-gdm, where typing user name went wrong 3 out of the 10 times from 20181214 to 20190103. They fail at first_boot
Also, in this case and this case

#26 Updated by okurz 8 months ago

Regarding wrong user name typing during installation: That should fail the test earlier in https://openqa.opensuse.org/tests/829175#step/user_settings/2 with the right needle. It looks like the last matched needle has vanished. Maybe you are onto this already?

#27 Updated by SLindoMansilla 6 months ago

  • Related to deleted (action #43889: [functional][u] test fails in ooffice - openQA makes spelling mistakes)

#28 Updated by SLindoMansilla 6 months ago

  • Blocks action #43889: [functional][u] test fails in ooffice - openQA makes spelling mistakes added

#29 Updated by okurz 4 months ago

  • Duplicated by action #51332: [sle][functional][u] test fails in firefox - wrong html page loaded added

#30 Updated by okurz 3 months ago

  • Related to action #53339: [functional][u] test fails in swing due to incorrect rendering on 16bpp framebuffers added

#31 Updated by AdamWill 3 months ago

This ticket is kinda more broad than my specific issue, but I did notice one very specific issue with the Firefox Wayland native backend (specifically, it does not happen with the X11 backend): typing modified characters fails very often. See https://bugzilla.redhat.com/show_bug.cgi?id=1727388 . I found that reversing the order of the 'key up' events when doing modified characters - so e.g. to type a colon we do:

  1. shift down
  2. ; down
  3. ; up
  4. shift up

instead of this:

  1. shift down
  2. ; down
  3. shift up
  4. ; up

works around that problem; https://github.com/os-autoinst/os-autoinst/pull/1174 is my PR for that.

#32 Updated by AdamWill 3 months ago

For the record, in the Fedora tests we commonly use our type_slowly and type_very_slowly macros for typing in GNOME (and KDE), exactly because we've seen this kind of problem before. type_slowly just slows down the typing speed. type_very_slowly slows it down even more, and does a wait_screen_change between each key press.

Also available in: Atom PDF