Project

General

Profile

action #42362

[qam][virtio][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland"

Added by zcjia about 2 years ago. Updated 7 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2018-10-12
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-x86_64-desktopapps-remote-client1@64bit-virtio-vga fails in
window_system

There are many failures in "window_system" here: https://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=66.2&groupid=118

These failures can go away if you restart the jobs a few times, but it should get proper fix.

The cause is that "script_output" calls "type_string", which types many characters too fast into gnome, causing some characters missing or repeated.

Similar issues happens on PPC64 platform, so there is "VNC_TYPING_LIMIT=10" in ppc64 worker.

I tested on OSD, this setting works fine for gnome.

(This setting means how many keys are typed into VNC in one second, default value is 50.)

In other tests, there are also typing failures from time to time.

@Oliver:

So I wonder if this setting should be added to the sle-15-desktop medium type?

Reproducible

Fails since (at least) Build 66.2 (current job)

Expected result

Last good: 63.1 (or more recent)

Further details

Always latest result in this scenario: latest

possible misstype ninjakeys

gnome_wayland_qxl2.png (87.8 KB) gnome_wayland_qxl2.png zgao, 2018-12-04 08:50
gnome_wayland_qxl.png (156 KB) gnome_wayland_qxl.png zgao, 2018-12-04 08:50
gnome_wayland_virtio.png (143 KB) gnome_wayland_virtio.png zgao, 2018-12-04 08:50
gnome_wayland_virtio2.png (74.5 KB) gnome_wayland_virtio2.png zgao, 2018-12-04 08:50
gnome_x11_qxl.png (178 KB) gnome_x11_qxl.png zgao, 2018-12-04 08:50
gnome_x11_virtio.png (189 KB) gnome_x11_virtio.png zgao, 2018-12-04 08:50
kde_wayland_virtio.png (97.7 KB) kde_wayland_virtio.png zgao, 2018-12-04 08:50
kde_wayland_virtio2.png (72.2 KB) kde_wayland_virtio2.png zgao, 2018-12-04 08:50
kde_x11_qxl.png (137 KB) kde_x11_qxl.png zgao, 2018-12-04 08:50
kde_wayland_qxl.png (156 KB) kde_wayland_qxl.png zgao, 2018-12-04 08:50
qemu_vnc_keypress.jpg (147 KB) qemu_vnc_keypress.jpg zgao, 2018-12-14 11:38
7244
7247
7250
7253
7256
7259
7262
7265
7268
7271
7334

Related issues

Related to openQA Tests - action #41681: [desktop][sporadic][virtio] test fails in window_system to open windows or type correctly - related to wayland/virtio?New2018-09-27

Related to openQA Tests - action #53339: [opensuse] test fails in swing due to incorrect rendering on 16bpp framebuffersResolved2019-06-19

Related to openQA Tests - action #60257: [functional][u][mistyping] random typing issues on waylandRejected2019-11-25

Has duplicate openQA Tests - action #51332: [sle][functional][u] test fails in firefox - wrong html page loadedRejected2019-05-09

Blocks openQA Tests - action #43889: [functional][u][virtio][wayland] test fails in ooffice - openQA makes spelling mistakesBlocked2018-11-16

History

#1 Updated by okurz about 2 years ago

zcjia wrote:

@Oliver:
So I wonder if this setting should be added to the sle-15-desktop medium type?

I doubt this is a good idea. Also I do not think that "we type too fast". I guess the system is just loaded and a bit unresponsive just after startup. One can consider this a product bug.

I would suggest an approach similar to
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5365
which ensures with the desktop_runner module that there is a certain "cool down time" directly after login. I suggest to involve jbaier as the test module maintainer of window_system to handle this.

#2 Updated by osukup about 2 years ago

  • Subject changed from [sle15sp1][desktop] test fails in window_system because "typing string is too fast in gnome" to [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in gnome"

same on qam jobs .. time to time this fails with typing issues

#3 Updated by zcjia about 2 years ago

@Oliver, I can confirm that the system load is not high when this error happens. Also, this error can happen elsewhere every now and then, which is very annoying.

I talked to another QA, his opinion and experience is that openQA is typing too fast: 50 key strokes a second is indeed too fast, especially for a desktop environment.

@jbaier, what's your opinion on this matter?

#4 Updated by jbaier_cz about 2 years ago

For me it is a little bit weird. It was working quite confidently at the time of the development. Now I see those errors more and more often. Of course, I can rewrite the test (at the cost of loosing the record_info feature for pretty presentation), however this issue could come back in other tests as there are a few tests which types in the terminal.

It would be nice to know the root of the issue (load on the worker / openQA backend issue / unreliable typing short after boot).

#5 Updated by zcjia almost 2 years ago

  • Status changed from New to In Progress

There are failures in palaces other than "window_system", for example in "firefox_smoke":

https://openqa.suse.de/tests/2216384#next_previous

And I have an important discovery: all these "typing too fast" failures happen in wayland instead of X11.

I'll try to dig deeper.

#6 Updated by zcjia almost 2 years ago

  • Assignee set to zcjia

#7 Updated by dzedro almost 2 years ago

I think problem is in broken key presses with QEMUVGA=std or QEMUGA=virtio

#8 Updated by zcjia almost 2 years ago

The reason to use QEMUVGA=virtio for wayland is that wayland doesn't support the default "cirrus", see https://progress.opensuse.org/issues/21786

#9 Updated by zcjia almost 2 years ago

  • Subject changed from [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in gnome" to [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland"

I tested X11+virtio, there's no problem; so the problem happens in wayland+virtio.

I'll try to reproduce this problem outside of openQA.

#10 Updated by yfjiang almost 2 years ago

Hi zcjia, okurz, I saw an upstream bug of qemu was reported https://bugs.launchpad.net/qemu/+bug/1802465 , it looks X is also affected but wayland is more easy to be impacted with a shorter length of trigger string. The problem is we are not sure how the issue will be escalated yet.

By the fact that there are quite some desktop testing of SLE-15-SP1 desktop were blocked by this issue, and Alpha-3 is coming:

http://openqa.suse.de/tests/overview?distri=sle&version=15-SP1&build=88.6&groupid=118

Is there a way to add an acceptable workaround to make the testing run more reliably? Thank you!

#11 Updated by zcjia almost 2 years ago

Since SLE-15-SP1-Alpha3 is coming, I will apply the workaround on OSD:

by adding "VNC_TYPING_LIMIT=10" to machine type "64bit-virtio-vga".

Let's see how it goes.

#12 Updated by okurz almost 2 years ago

Yes, I think this is a good idea. As long as you call the ticket only done when you removed that workaround again I am fine ;)

#13 Updated by okurz almost 2 years ago

  • Related to action #41681: [desktop][sporadic][virtio] test fails in window_system to open windows or type correctly - related to wayland/virtio? added

#14 Updated by szarate almost 2 years ago

  • Description updated (diff)

So, looking at this... looks to me again to missing keys/ninjakeys: https://openqa.suse.de/tests/2167333#step/window_system/4 if you look closely... the N is never let go and looking at the latest job: https://openqa.suse.de/tests/2277128 there's a poem missing there... and the same for other jobs in the build...

#15 Updated by szarate almost 2 years ago

  • Related to action #43889: [functional][u][virtio][wayland] test fails in ooffice - openQA makes spelling mistakes added

#16 Updated by okurz almost 2 years ago

  • Subject changed from [qam][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland" to [qam][virtio][sle15sp0][sle15sp1][desktop] test fails in window_system because "typing string is too fast in wayland"

yes, but in this example again it's specific to virtio, see #41681

#17 Updated by zgao almost 2 years ago

7244
7247
7250
7253
7256
7259
7262
7265
7268
7271

I ran several tests with respect to

  • qxl or virtio
  • x11 or wayland
  • gnome or kde

It seems to me that:

  • qxl seems to mitigate the problem, if you could compare gnome_wayland_qxl.png to gnome_wayland_virtio.png in the attachment

  • x11 has much fewer typing misses but once in a while it still happens

  • A special case ABCDE causes high miss rate on shift key(around 50%) for both x11 and wayland, please refer to *2.png in the attachment

Above is my option and it's merely based on my observation. I ran around 1000 characters for each test. Qemu version based on stable-2.12 and queue_count enlarged to 102400.

What you do think? What other things can we tweak?

#18 Updated by okurz almost 2 years ago

Good evaluation. I wonder why you selected libreoffice though. IMHO libreoffice is especially prone to cause mistyping, see #43889, maybe because of "spell correction" or something. The original ticket observation was about "window_system" so maybe the results would actually be different if you would conduct the experiment in a more "simple" environment, e.g. gedit/kwrite

#19 Updated by zgao almost 2 years ago

Thank you!
I actually tested on both gedit and libreoffice. I will proceed with gedit here after.

I found a special case about sending key streams ending with a uppercase letter, e.g. ABCDE, AbC, aaaC

If you could refer to gnome_wayland_virtio2.png, gnome_wayland_qxl2.png or kde_wayland_virtio2.png in the above attachment, you could found there's a obvious miss on lowercase or uppercase every two times.

The reason is as follows, when we send one character A, the key code sent is actually like

  1. capslock pressed
  2. a pressed
  3. enter pressed So capslock is always used instead of shift and now capslock gets toggled.

If a lowercase letter comes after a capital letter, the capslock would be pressed again so capslock will get back to normal.
But if key stream ends with a uppercase letter, it is observed that capslock is not pressed again and therefore toggled.

Now I'm wondering which part of code actually deals with capslock, is it ps2 driver handling it or vncdotool?
What do you think?

#20 Updated by okurz almost 2 years ago

Are you sure it's "caps lock" or just "shift"? I guess that it is either qemu or VNC, I doubt that vncdotool is involved here at all. https://build.opensuse.org/package/view_file/Virtualization/qemu/0025-Fix-tigervnc-long-press-issue.patch?expand=1 might be related? Just browsed the web quickly myself, maybe https://www.berrange.com/posts/2010/07/04/more-than-you-or-i-ever-wanted-to-know-about-virtual-keyboard-handling/ helps. Sorry, can't help more.

#21 Updated by zgao almost 2 years ago

7334

Thank you for the comment!!

An update about this issue,

I checked on guest OS regarding what values kernel receive, and it turns out to be wrong.

  • First on guest OS, I found the keyboard device at /dev/input/eventX, X might subject to be any number.
  • Then I monitor what this device -- eventX receives by evtest /dev/input/eventX

According to my observation, there are key misses at /dev/input/eventX. But I have not yet find any key wrongly ordered. A key miss might cause a lot more errors on x11 or wayland.

Additionally, qemu has vnc layer, input event queue layer, ps2 key queue layer. I printed values in the queues of these lays --- and found out they are all accurate.

So I presume it is a problem about qemu interfacing with guest kernel.
I attached a graph how qemu deals with key inputs, it's made by me and hope it helps.

#22 Updated by okurz almost 2 years ago

wow, I appreciate your fine detailed investigation notes, especially the picture can help understanding, thx.

#23 Updated by zgao almost 2 years ago

I found the bug resides both in qemu internals and in wayland.

In qemu, it is a concurrency problem since capslock LED is handled by another thread --- so keycode streams get messed up when capslock LED is emitted.

I opened up a bug on qemu upstream

The bug regarding wayland is observed when kernel event device receives correct keycode streams, gedit still shows wrong values. It seems to be relevant to bug 1117833: Wayland: briefffffffffff unresponsivvvvvvvvvvvveness and repeated keys This bug is not observed on x11.

To check if kernel is to blame or not, I used ftrace and dynamic_debug to peek into what kernel receives. It seems kernel receives what qemu sends it and passes all recognized key streams to /dev/input/event.

#24 Updated by zgao almost 2 years ago

Found related bug report related to fedora.

Bug 1658676 - Typing '<' more than once (with the specific keypress sequence used by os-autoinst) breaks with usb-kbd and VNC

Also,
'Less than' (<), 'more than' (>), and 'pipe' (|) can't be typed via VNC

Additionally, I will pay some attention to wayland bug, since it happens much more often comparably.

#25 Updated by zgao almost 2 years ago

Issue raised on gnome-shell gitlab
As pointed out by zcjia, wayland typing issue only happens after gnome-shell 3.2.6.

Noted that this wayland issue is to be blame for severe typing flaws on wayland under gnome.

But typing issues on openQA involves qemu issues, and possibly other parts ---we found that create_hdd_* cases might have user name typed wrong, but I failed to reproduce this problem in installed x11 instance where I tested around 250,000 characters through vnc 10ms per char.

You could refer to gnome-gdm, where typing user name went wrong 3 out of the 10 times from 20181214 to 20190103. They fail at first_boot
Also, in this case and this case

#26 Updated by okurz almost 2 years ago

Regarding wrong user name typing during installation: That should fail the test earlier in https://openqa.opensuse.org/tests/829175#step/user_settings/2 with the right needle. It looks like the last matched needle has vanished. Maybe you are onto this already?

#27 Updated by SLindoMansilla over 1 year ago

  • Related to deleted (action #43889: [functional][u][virtio][wayland] test fails in ooffice - openQA makes spelling mistakes)

#28 Updated by SLindoMansilla over 1 year ago

  • Blocks action #43889: [functional][u][virtio][wayland] test fails in ooffice - openQA makes spelling mistakes added

#29 Updated by okurz over 1 year ago

  • Has duplicate action #51332: [sle][functional][u] test fails in firefox - wrong html page loaded added

#30 Updated by okurz over 1 year ago

  • Related to action #53339: [opensuse] test fails in swing due to incorrect rendering on 16bpp framebuffers added

#31 Updated by AdamWill over 1 year ago

This ticket is kinda more broad than my specific issue, but I did notice one very specific issue with the Firefox Wayland native backend (specifically, it does not happen with the X11 backend): typing modified characters fails very often. See https://bugzilla.redhat.com/show_bug.cgi?id=1727388 . I found that reversing the order of the 'key up' events when doing modified characters - so e.g. to type a colon we do:

  1. shift down
  2. ; down
  3. ; up
  4. shift up

instead of this:

  1. shift down
  2. ; down
  3. shift up
  4. ; up

works around that problem; https://github.com/os-autoinst/os-autoinst/pull/1174 is my PR for that.

#32 Updated by AdamWill over 1 year ago

For the record, in the Fedora tests we commonly use our type_slowly and type_very_slowly macros for typing in GNOME (and KDE), exactly because we've seen this kind of problem before. type_slowly just slows down the typing speed. type_very_slowly slows it down even more, and does a wait_screen_change between each key press.

#33 Updated by okurz 12 months ago

Hi zcjia, zgao the issue of problems with typing characters persists as also the linked tickets describe, e.g. #43889 . What are your plans on working on this issue?

#34 Updated by zcjia 12 months ago

okurz wrote:

Hi zcjia, zgao the issue of problems with typing characters persists as also the linked tickets describe, e.g. #43889 . What are your plans on working on this issue?

Hi Oliver, (zgao was an intern and left the company), this problem is caused by multiple sources, one of them is the version of GNOME. I will give this problem another try as we are going to GNOME 3.34 in SLE15SP2.

#35 Updated by okurz 12 months ago

I saw the latest problem in Tumbleweed where there is already gnome 3.34 so no need to wait :)

#36 Updated by okurz 9 months ago

  • Related to action #60257: [functional][u][mistyping] random typing issues on wayland added

#37 Updated by bmwiedemann 7 months ago

For the record: the original rate was 6.66 keystrokes/s from
https://github.com/os-autoinst/os-autoinst/blob/v1/bmwqemu.pm#L398
That was still using the qemu monitor interface where capital letters were one "sendkey shift-n".

And when typing longer strings, it would wait for output to catch up after 13 chars:
https://github.com/os-autoinst/os-autoinst/blob/v1/bmwqemu.pm#L423
mostly meant for isolinux and grub where BIOS keystroke buffer was limited to 15 and graphics could take a while to redraw. But I also remember cases where KDE was slow to react and slower typing helped.

Also available in: Atom PDF