action #12250
closed[sporadic]"QEMU: usb-kbd: warning: key event queue full" on ppc64le and aarch64
0%
Description
observation¶
missing character "-f" after the test
in https://openqa.suse.de/tests/412084/modules/snapper_undochange/steps/21, message in logfile: "QEMU: usb-kbd: warning: key event queue full"
problem¶
typing too fast for this backend?
suggestion¶
Probably we should detect this QEMU message and handle it explicitly.
further details¶
Related issue: https://progress.opensuse.org/issues/12018
workaround¶
Sporadic issue: Retrigger job
Updated by okurz over 8 years ago
- Assignee set to coolo
happened recently here: https://openqa.suse.de/tests/463809
who is our ppc64le qemu expert?
Updated by algraf over 8 years ago
I think we just need to use the new vnc completion timeout feature from QEMU to work around that. Same issue as aarch64.
Updated by okurz over 8 years ago
I knew the coolo-okurz back and forth would serve a purpose ;-)
What you mentioned sounds interesting. Can you go into a little bit more detail or provide a reference?
Updated by algraf over 8 years ago
okurz wrote:
I knew the coolo-okurz back and forth would serve a purpose ;-)
What you mentioned sounds interesting. Can you go into a little bit more detail or provide a reference?
Sure, I'm referring to this patch:
https://lists.gnu.org/archive/html/qemu-devel/2016-05/msg03811.html
which was written exactly to fix this issue. The patch is also available in the Virtualization/qemu package:
as well as SLE12 SP2:
So all you need to do is make sure you're running a very recent QEMU version and pass -vnc ...,key-delay-ms=200 as parameter to the qemu command line.
Updated by mgriessmeier over 8 years ago
created PR for that: https://github.com/os-autoinst/os-autoinst/pull/568
but since lack of backend knowledge, I'll need some help with that
okurz suggested to put in a check for the qemu version
Updated by dirkmueller over 8 years ago
I'm currently out of office and your email has been received. I'll be back August 31st. For issues related
SUSE OpenStack Cloud that need attention before that, please reach out to Roman Arcea rarcea@suse.com.
Thanks,
Dirk
Updated by AdamWill about 8 years ago
I may be missing something here, but - we already have quite a bit of rate limiting code in os-autoinst , including an arg you can pass to type_string
to control the speed of typing. In SUSE's distri code you have type_slowly
and type_very_slowly
functions that limit the rate beyond the default by using this arg, for instance. Do we really move in the right direction by turning on some all-or-nothing rate limiting in qemu?
If it's really the case that a given arch can't keep up with the same rate of typing in 'ideal conditions' - i.e. at a console with the minimum amount of other activity going on in the SUT and its host - then couldn't we just adjust the default rate limiting in type_string
etc. to be different per arch?
And is that really the problem in this case at all? What I think I see on Fedora is that typing issues seem to be related to system load. For instance, we almost never get any errors when typing at a console on an otherwise-idle VM...but very occasionally we do get some characters swallowed, and if you happen to catch it live or get a fortunate video (the speedup on the videos can make them useless for analyzing this) and analyze it frame-by-frame, it looks like the VM mostly keeps up just fine with the typing, then suddenly for a second or two absolutely no keypresses show up, then a bunch of them appear at once; basically, it looks like something somewhere (the SUT, the worker host, or some other damn part of the chain) gets really busy and stops forwarding keypresses, then when it stops being busy, all the backlogged key presses arrive in a lump - only if more presses backed up than the SUT (I think that's where the buffer is) has buffer space for, you lose some of them.
Rate limiting could mitigate that, but universal rate limiting of all typing ever is a very large hammer for cracking that nut, and not even really foolproof, if the 'not forwarding keypresses' period lasts long enough.
Updated by okurz about 8 years ago
Yes, I think you are right. Basically as also discussed in IRC some times and also in another progress issue, e.g. see #13702: If this issue appears it is not really a problem in qemu but just the symptom that the SUT gets stalled. A human user with better OCR is waiting for the expected even. We mainly have the problem described here when we are typing long sequences without waiting for proper feedback of the system. IMHO there should be some waiting when typing long sequences but I am not sure what kind. Would would you say regarding #13702? I'd say as in #13702#note-3 a forced wait_still_screen(1)
after 30 characters by default in type_string
which can be overridden on demand is a good approach. WDYT?
Updated by agraf@suse.de about 8 years ago
Would it help if you had a separate qemu keyboard input command (over vnc?) that only returns if the key was successfully pulled from the fifo queue?
Alex
Am 19.09.2016 um 23:22 schrieb redmine@opensuse.org redmine@opensuse.org:
[openSUSE Tracker]
Issue #12250 has been updated by okurz.Yes, I think you are right. Basically as also discussed in IRC some times and also in another progress issue, e.g. see #13702: If this issue appears it is not really a problem in qemu but just the symptom that the SUT gets stalled. A human user with better OCR is waiting for the expected even. We mainly have the problem described here when we are typing long sequences without waiting for proper feedback of the system. IMHO there should be some waiting when typing long sequences but I am not sure what kind. Would would you say regarding #13702? I'd say as in #13702#note-3 a forced
wait_still_screen(1)
after 30 characters by default intype_string
which can be overridden on demand is a good approach. WDYT?
action #12250: "QEMU: usb-kbd: warning: key event queue full" on ppc64le
https://progress.opensuse.org/issues/12250#change-28946
- Author: okurz
- Status: New
- Priority: Normal
- Assignee:
- Category: Bugs in existing tests
* Target version: ¶
observation¶
missing character "-f" after the
test
in https://openqa.suse.de/tests/412084/modules/snapper_undochange/steps/21, message in logfile: "QEMU: usb-kbd: warning: key event queue full"problem¶
typing too fast for this backend?
suggestion¶
Probably we should detect this QEMU message and handle it explicitly.
further details¶
Related issue: https://progress.opensuse.org/issues/12018
--
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here: http://progress.opensuse.org/my/account
Updated by okurz@suse.de about 8 years ago
Would it help if you had a separate qemu keyboard input command (over vnc?)
that only returns if the key was successfully pulled from the fifo queue?
sounds like a great idea. I don't if we the then still have problems with
other backends but the qemu machines are more prone to mutual stalling so I
might very well help.
Updated by maritawerner about 8 years ago
What is the status here? Can szarate do the fix suggested from agraf?
Updated by szarate about 8 years ago
- Assignee set to szarate
i'm taking this one... the patch mentioned by @agraf is already present (it's fun to use it with 1000ms...
Updated by szarate about 8 years ago
- Assignee deleted (
szarate) - Priority changed from Normal to Low
I could not reproduce the same issue on ppc64le, but i got similar problems with x86_64.
While i was using the -key-delay i could get rid of the type_slow and type_very_slow typing speeds... but this would led to all sort of different problems in following screens.
I'm changing the priority to low, since it has not been seen in a while (At least not a specific message)
Updated by mkravec about 8 years ago
QEMU: usb-kbd: warning: key event queue full
QEMU: main-loop: WARNING: I/O thread spun for 1000 iterations
Job: https://openqa.suse.de/tests/666816#step/zypper_lifecycle/15
Updated by zluo about 8 years ago
yes, it seems that openQA server is overloaded.
Updated by szarate about 8 years ago
To add a bit more on what @zluo wrote on #15118, there are stalls in this job that are up to 38 seconds, which basically means that the system was really slow at that point.
However, what mkravec is describing seems to be more related to: This bug... It seems that there are few solutions to this like this one.
Updated by szarate about 8 years ago
- Has duplicate action #15118: Text is mistyped on aarch64 added
Updated by szarate about 8 years ago
- Related to coordination #14972: [tools][epic] Improvements on backend to improve better handling of stalls added
Updated by asmorodskyi about 8 years ago
another example of failure https://openqa.suse.de/tests/668028
Updated by asmorodskyi almost 8 years ago
- Priority changed from Low to Normal
another example https://openqa.suse.de/tests/685272/file/autoinst-log.txt
also change priority back to normal , according latest comments issue start to appear more often
Updated by mkravec almost 8 years ago
Another example:
https://openqa.suse.de/tests/704536#step/hostname_inst/3
Updated by mkravec almost 8 years ago
- Subject changed from "QEMU: usb-kbd: warning: key event queue full" on ppc64le to "QEMU: usb-kbd: warning: key event queue full" on ppc64le and aarch64
Updated by szarate almost 8 years ago
@coolo, i think it would be wise to add some kind of monitoring now at least to capture the CPU usage... i'll play with a machine in orthos to see if i can catch it... but so far looks like we can say that this would be a known failure (And therefore a job to restart automatically by the webui)
Updated by mkravec almost 8 years ago
There are failures in every build (5 tests failed in build 0202)
Build0211: https://openqa.suse.de/tests/721648
Updated by szarate almost 8 years ago
- Status changed from New to In Progress
Now with Die early when usb-kbd: key event full is found, jobs that have this abnormal situation will be inmediately terminated. In the future, they should be rescheduled.
Updated by szarate almost 8 years ago
- Status changed from In Progress to Feedback
Will set the status to feedback for now...
Updated by asmorodskyi almost 8 years ago
- Status changed from Feedback to In Progress
@szarate behavior did not change https://openqa.suse.de/tests/739477#downloads
Updated by szarate almost 8 years ago
- Status changed from In Progress to Feedback
I came across this interesting one: https://openqa.suse.de/tests/733459#step/updates_packagekit_gpk/7 The message "QEMU: usb-kbd: warning: key event queue full" is not shown, but still the c from clear was already missing.
This might be unrelated but leaving this here for future reference.
Also, changing the status to feedback as it has been fixed but waiting for deploy of os-autoinst on workers for final verification.
Updated by szarate almost 8 years ago
szarate wrote:
I came across this interesting one: https://openqa.suse.de/tests/733459#step/updates_packagekit_gpk/7 The message "QEMU: usb-kbd: warning: key event queue full" is not shown, but still the c from clear was already missing.
This might be unrelated but leaving this here for future reference.
Also, changing the status to feedback as it has been fixed but waiting for deploy of os-autoinst on workers for final verification.
This is also another similar occurence of this bug: http://e13.suse.de/tests/1705#step/yast2_apparmor/10 previous screen sends a alt-n but there's no messages again by the QEMU backend...
Updated by mgriessmeier almost 8 years ago
should be related: https://openqa.suse.de/tests/748289#
Updated by okurz almost 8 years ago
- Status changed from Feedback to In Progress
@szarate: can you take this? https://openqa.suse.de/tests/754579 fails with this and it is not an "early die"
Updated by okurz almost 8 years ago
- Priority changed from Normal to High
grrr, and also this one: https://openqa.suse.de/tests/754580#step/zypper_lifecycle/53
Updated by okurz almost 8 years ago
you know that overdrive2 is not under control of salt and therefore fall in the domain of "but it is fixed in os-autoinst@master!!!1eins"
Updated by okurz almost 8 years ago
ppc64le: https://openqa.suse.de/tests/754640
it died immediately after the "queue full" was detected but the job is regarded as "incomplete" and the error is not obvious without diving into the logfile. So some time saved by not fooling ourselves. for "incomplete" it is pretty obvious we need to look into the logfile anyway but it would be better if we can call something like a "record_known_failure" and forward that to the webui.
Updated by szarate almost 8 years ago
- Status changed from In Progress to Resolved
I'm thinking about renaming this bug to the: Neverending bug!. Anywho... Changing my mind and will set this one to resolved... The missing/ninja keys is another issue. Guys feel free to reopen in case it happens again :D
Updated by pgeorgiadis almost 8 years ago
- Status changed from Resolved to In Progress
And yes, it happened again :P openQA crashed during qam-minimal for aarch64
Latest build can be found here. The test crashed (dark red color).
Here's the log from auto-inst:
19:37:18.1552 27443 <<< testapi::type_string(string='zypper -n in -l -t patch SUSE-SLE-SERVER-12-SP2-2017-4210 | tee /tmp/zypper.log ; echo 2VCGL-${PIPESTATUS}- > /dev/ttyAMA0', max_interval=250, wait_screen_changes=0)
19:37:18.7930 27445 QEMU: usb-kbd: warning: key event queue full
DIE QEMU: Shutting down the job at /usr/lib/os-autoinst/backend/qemu.pm line 939.
This error had not happened in the last 3 months.
Updated by szarate almost 8 years ago
- Status changed from In Progress to Resolved
pgeorgiadis wrote:
And yes, it happened again :P openQA crashed during qam-minimal for aarch64
Latest build can be found here. The test crashed (dark red color).
Here's the log from auto-inst:
19:37:18.1552 27443 <<< testapi::type_string(string='zypper -n in -l -t patch SUSE-SLE-SERVER-12-SP2-2017-4210 | tee /tmp/zypper.log ; echo 2VCGL-${PIPESTATUS}- > /dev/ttyAMA0', max_interval=250, wait_screen_changes=0)
19:37:18.7930 27445 QEMU: usb-kbd: warning: key event queue full
DIE QEMU: Shutting down the job at /usr/lib/os-autoinst/backend/qemu.pm line 939.
This error had not happened in the last 3 months.
The actual fix was to add specifically that die :). So far it's a problem we will have to live with, until we figure out how to deal with overbooking and qemu usb packet drop (https://bugzilla.redhat.com/show_bug.cgi?id=1293234)
szarate wrote:
Now with Die early when usb-kbd: key event full is found, jobs that have this abnormal situation will be inmediately terminated. In the future, they should be rescheduled.
Updated by mitiao over 7 years ago
The issue always reproduced on power for recent builds
https://openqa.suse.de/tests/829192
https://openqa.suse.de/tests/829193
Updated by SLindoMansilla over 7 years ago
- Status changed from Resolved to In Progress
- Priority changed from High to Normal
as discussed with okurz. Not happening too often but as long as it appears we need to handle it somehow and track it.
Updated by coolo over 7 years ago
So when typing with 50kps we can reproduce this problem like 100% in console tests. 15 was okayish, so I guess ppc (and possibly aarch64) would need some even lower.
As also ikvm and other platforms are affected (I had to lower the typing speed for idrac interfaces to 10kps), I guess we need to have this setting globally in the machine definition.
Updated by algraf over 7 years ago
This is the bugzilla to track "keyboard type queue is empty" events which OpenQA could then use to only press keys when they were successfully handled by the guest:
Updated by SLindoMansilla over 7 years ago
- Category changed from Bugs in existing tests to Infrastructure
as discussed with okurz
Updated by SLindoMansilla over 7 years ago
It also happens here https://openqa.suse.de/tests/855925
Updated by mgriessmeier over 7 years ago
another one:
https://openqa.suse.de/tests/858593
Updated by michel_mno over 7 years ago
Other occurrences for Leap 42.3 and ppc64le (1)
systematically when "curl_https" script has loop of validate_script_output. (2)
Do we need a specific new issue for this problem ?
(1) https://openqa.opensuse.org/tests/overview?distri=opensuse&version=42.3&build=0037&groupid=30
(2) https://openqa.opensuse.org/tests/381080/modules/curl_https/steps/1/src
Updated by michel_mno over 7 years ago
And another exemple (1) for sshd.pm
where the issued string is 86 characters long (2)
(1) https://openqa.opensuse.org/tests/382112/file/autoinst-log.txt
(2) 'ssh -v sshboy@localhost -t echo LOGIN_SUCCESSFUL; echo SSH-1491407073-$?- > /dev/hvc0'
Updated by nicksinger over 7 years ago
Here is another example: https://openqa.suse.de/tests/861360
Updated by dzedro over 7 years ago
ppc64le migration test failed on QEMU: usb-kbd: warning: key event queue full
https://openqa.suse.de/tests/918599
https://openqa.suse.de/tests/918604
Updated by nicksinger over 7 years ago
Here are some more recent examples on ppc64le:
https://openqa.suse.de/tests/931322 (Tried to type 134 chars before fail)
https://openqa.suse.de/tests/930630 (107 chars)
https://openqa.suse.de/tests/931356 (86 chars)
https://openqa.suse.de/tests/931937
https://openqa.suse.de/tests/930663
Also the log of this: https://openqa.suse.de/tests/931322/file/autoinst-log.txt
Shows another interesting error:
22:01:27.9912 88247 capture loop failed QEMU: Shutting down the job at /usr/lib/os-autoinst/backend/qemu.pm line 955.
but this is maybe just caused by the full event queue.
Updated by mitiao over 7 years ago
slesp2 image creating (for migration) failed on aarch64_maintenance worker
https://openqa.suse.de/tests/940559
Updated by okurz over 7 years ago
- Subject changed from "QEMU: usb-kbd: warning: key event queue full" on ppc64le and aarch64 to [sporadic]"QEMU: usb-kbd: warning: key event queue full" on ppc64le and aarch64
- Description updated (diff)
Updated by mitiao over 7 years ago
Re-triggered several times and always failed at the same step with key event queue full issue:
https://openqa.suse.de/tests/982202
Any idea to avoid it to publish a SP2 aarch64 image, then I can try migration from that.
Updated by RBrownSUSE over 7 years ago
My advice would be to apply the same fix to the tests that I suggested in #19540
I would recommend changing grub_set_bootargs.pm line 20 to use wait_screen_changes=1 to naturally type more slowly and therefore avoid filling up the key event queue.
Updated by SLindoMansilla over 7 years ago
Recent example:
sle-12-SP3-Server-DVD-ppc64le-Build0424-RAID0@ppc64le
osd#995630#downloads
Updated by pgeorgiadis over 7 years ago
Recent example: qam-minimal-full@aarch64_maintenance just died because of this:
QEMU: usb-kbd: warning: key event queue full
Log: https://openqa.suse.de/tests/1035782/file/autoinst-log.txt
https://openqa.suse.de/tests/1035782#
Updated by algraf over 7 years ago
pgeorgiadis wrote:
Recent example: qam-minimal-full@aarch64_maintenance just died because of this:
QEMU: usb-kbd: warning: key event queue full
Log: https://openqa.suse.de/tests/1035782/file/autoinst-log.txt
https://openqa.suse.de/tests/1035782#
Is this using the latest SP3 QEMU yet?
Updated by algraf over 7 years ago
algraf wrote:
Is this using the latest SP3 QEMU yet?
Oh I just saw this is maintenance, so it's running on the old SP2 OpenQA system?
Updated by szarate almost 7 years ago
- Status changed from In Progress to Resolved
I think this hasnt been seen for quite a while. Overdrive2 is not even doing any job fro a while.
Updated by okurz over 5 years ago
- Related to action #38777: [tools][ppc64le] QEMU: usb-kbd: warning: key event queue full added