Project

General

Profile

Actions

action #35589

closed

coordination #35302: [qe-core][opensuse][functional][epic][sporadic] Various unstable tests on o3

[functional][u][opensuse][sporadic][medium] test fails in kontact - needs workaround for boo#1105207, then akregator not closed

Added by JERiveraMoya over 6 years ago. Updated almost 5 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA (private) - Milestone 28
Start date:
2018-04-26
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-kde-wayland@64bit_virtio fails in
kontact

After akregator is open, apparently after running ​akonadictl start before finishing the test akregator needs to be closed to be able to match needles intended for match that app is closed and desktop is visible.

Reproducible

Fails since (at least) Build 20180420

Expected result

Last good: 20180419 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 5 (1 open4 closed)

Related to openQA Tests (public) - action #51944: [opensuse][functional][u] test fails in dolphin -- "kdialog --getopenfilename" fails to startRejectedzluo2019-05-23

Actions
Related to openQA Tests (public) - action #53045: [opensuse][kde][sporadic] krunner suggestions check is racyNew2019-06-13

Actions
Has duplicate openQA Tests (public) - action #42830: [functional][u] test fails in kontact - akregator was launched at the first placeRejectedokurz2018-10-24

Actions
Blocked by openQA Tests (public) - action #46223: [functional][u] test fails in user_gui_login - fails to re-login, password not typed or entry field not focussed?Resolvedokurz2019-01-15

Actions
Blocked by openQA Tests (public) - action #53339: [opensuse] test fails in swing due to incorrect rendering on 16bpp framebuffersResolvedokurz2019-06-19

Actions
Actions #1

Updated by mloviska over 6 years ago

Similar issue
https://openqa.opensuse.org/tests/664692#step/kontact/21

Amarok was not closed by previous test.

Actions #2

Updated by okurz over 6 years ago

  • Subject changed from [opensuse][functional] test fails in kontact - akregator not closed to [opensuse][u][functional] test fails in kontact - akregator not closed
  • Due date set to 2018-06-05
  • Target version set to Milestone 16
Actions #3

Updated by okurz over 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: kde-wayland@64bit_virtio-2G
https://openqa.opensuse.org/tests/676017

Actions #4

Updated by riafarov over 6 years ago

  • Status changed from New to Workable
Actions #5

Updated by SLindoMansilla over 6 years ago

  • Subject changed from [opensuse][u][functional] test fails in kontact - akregator not closed to [opensuse][u][functional][sporadic][medium] test fails in kontact - akregator not closed

We can see the command not typed entirely. Maybe the command runner is not ready when the test starts typing (mising keys?).

Actions #6

Updated by SLindoMansilla over 6 years ago

  • Due date changed from 2018-06-05 to 2018-06-19

Not enough capacity during sprint 18

Actions #7

Updated by mgriessmeier over 6 years ago

  • Due date deleted (2018-06-19)
Actions #8

Updated by okurz over 6 years ago

  • Due date set to 2018-08-14
  • Target version changed from Milestone 16 to Milestone 18
Actions #9

Updated by mloviska over 6 years ago

From the video it seems like desktop runner is ready to accept whole input. As the script types first characters "ak" runner gives suggestions pointing to amarok. Somehow the first characters get deleted therefore we see incomplete input.

Actions #10

Updated by okurz over 6 years ago

  • Target version changed from Milestone 18 to Milestone 18
Actions #11

Updated by okurz over 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: kde-wayland@64bit_virtio-2G
https://openqa.opensuse.org/tests/677523

Actions #12

Updated by okurz over 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: kde-wayland@64bit_virtio
https://openqa.opensuse.org/tests/712629

Actions #13

Updated by zluo over 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over

Actions #14

Updated by zluo over 6 years ago

filed https://progress.opensuse.org/issues/38864 for compilation issue.

Actions #15

Updated by zluo over 6 years ago

Actions #16

Updated by zluo over 6 years ago

http://e13.suse.de/tests/7120#step/desktop_runner :(

desktop runner doesn't start up.

Actions #17

Updated by okurz over 6 years ago

Your customized test scheduled can not work this way because x11/kontact does not call select_console('x11') so you are stuck on the terminal that consoletest_setup selected. Either schedule another module in between, e.g. consoletest_finish, don't schedule consoletest_setup or put the call to select_console('x11') temporarily into the test module kontact

Actions #18

Updated by zluo over 6 years ago

@okurz yes, thanks!

Actions #19

Updated by zluo over 6 years ago

at moment I cannot reproduce this issue with akregator. It seems to not started at all in my test runs:

x11_start_program('akonadictl start', valid => 0);
# Workaround: sometimes the account assistant behind of mainwindow or tips window
# To disable it run at first time start
x11_start_program("echo \"[General]\" >> ~/.kde4/share/config/kmail2rc",         valid => 0);
x11_start_program("echo \"first-start=false\" >> ~/.kde4/share/config/kmail2rc", valid => 0);
x11_start_program("echo \"[General]\" >> ~/.config/kmail2rc",                    valid => 0);
x11_start_program("echo \"first-start=false\" >> ~/.config/kmail2rc",            valid => 0);

There is no checks for status, and video doesn't show akregator at all.

Actions #20

Updated by okurz over 6 years ago

  • Status changed from In Progress to Blocked
  • Assignee changed from zluo to okurz

so it seems there is some confusion. "akgregator" is the news reader that is visible in https://openqa.opensuse.org/tests/662767#step/kontact/2 . Where this comes from I do not really know. It might be related to akonadi starting but akonadi != akregator, but a background server.

For now we are blocked by bsc#1102832 . Let's see how it looks afterwards.

Actions #21

Updated by okurz over 6 years ago

Thanks for zluo for investigating this.

Actions #22

Updated by okurz over 6 years ago

  • Due date changed from 2018-08-14 to 2018-08-28
  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

back to original problem and workable: https://openqa.opensuse.org/tests/723149#step/kontact/2

Actions #23

Updated by jorauch over 6 years ago

  • Assignee set to jorauch

taking a look

Actions #24

Updated by jorauch over 6 years ago

  • Assignee deleted (jorauch)

No real progress here, therefore unassigning.
Could not reproduce it and since it's a sporadic start of an unwanted program I think this is more of a product issue

Actions #25

Updated by oorlov over 6 years ago

  • Assignee set to oorlov

As I can see there are missing keys when x11_start_program function called. Sometimes it writes 'nadictl start', sometimes 'ona', etc. All of them are part of 'akonadictl start'. The question is why KDE opens 'Akregator' by all that keywords.

  1. I'll try to reproduce the issue locally with MAKETESTSNAPSHOTS=1 parameter to be able to connect to the failed module quickly;
  2. System might be unresponsible for some time due to high load, caused by some process. So, I'll try to gather logs on what is running at the moment and how high CPU and memory usage are;
  3. It might be possible, that previous module affects that one (amarok.pm), as only the first command after Amarok closing is affected. All further commands are written without key missing. So I'll try to check different combinations.
Actions #26

Updated by oorlov over 6 years ago

  • Status changed from Workable to In Progress
Actions #27

Updated by oorlov over 6 years ago

I've tried to reproduce the issue locally, but the test failed on 'start_wayland_plasma5' module.

The appropriate bcs ticket is marked as RESOLVED FIXED on 13.08.2018, but I've found out that the issue is still happening on o3 on 15.08.2018 and also I've reproduced it locally with the latest build.

So, I've asked fvogt if the fix already applied in the last build, but not he didn't reply me yet.

Actions #28

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-08-28 to 2018-09-11
Actions #29

Updated by oorlov over 6 years ago

A fix 'start_wayland_plasma5' module was applied in https://bugzilla.opensuse.org/show_bug.cgi?id=1105798.

So, after the fix will appear on o3, it may be possible to reproduce the issue with 'kontact' module.

Actions #30

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-09-11 to 2018-09-25
Actions #31

Updated by oorlov about 6 years ago

Fix for 'start_wayland_plasma5' is on o3, so I'm in progress of investigating the 'kontakt' issue.

Actions #32

Updated by SLindoMansilla about 6 years ago

  • Due date changed from 2018-09-25 to 2018-10-09

Moving to sprint 27. Not able to finish during 26.

Actions #33

Updated by okurz about 6 years ago

  • Target version changed from Milestone 18 to Milestone 19
Actions #34

Updated by oorlov about 6 years ago

  • Status changed from In Progress to Workable
  • Assignee deleted (oorlov)

I was not able to reproduce the issue locally, as modules that are before the 'kontakt' failed on test runs on my machine. e.g. http://10.160.65.138/tests/421, http://10.160.65.138/tests/422, http://10.160.65.138/tests/423.

The issue is hard to catch, as a lot of modules should pass before the 'kontakt' module is executed. So, could you please consider this before re-estimating the ticket.

Actions #35

Updated by okurz about 6 years ago

  • Due date changed from 2018-10-09 to 2018-10-23
  • Target version changed from Milestone 19 to Milestone 20

I suggest to call just boot_to_desktop and then kontact to ensure that kontact itself is not the problem. And then schedule boot_to_desktop, amarok, kontact. It makes sense to work on this in the next sprint.

Actions #36

Updated by zluo about 6 years ago

  • Assignee set to zluo

take over

Actions #37

Updated by zluo about 6 years ago

  • Status changed from Workable to In Progress

to check latest job for this scenario and clone the job locally at first.

Actions #38

Updated by zluo about 6 years ago

use assert_and_click, create a new needle for the fix:

http://e13.suse.de/tests/9420#step/kontact/15

Actions #40

Updated by zluo about 6 years ago

  • Status changed from In Progress to Blocked

kontact failed on 3o because of a fatal error:
https://bugzilla.opensuse.org/show_bug.cgi?id=1111606

So this issue blocks verification of test runs on 3o.
Set it as blocked for now.

Actions #41

Updated by okurz about 6 years ago

  • Due date deleted (2018-10-23)
  • Target version changed from Milestone 20 to Milestone 21

Yes, good point. No need to invest time on something that is blocked by a bug. When we are done with all other tasks we can still revisit here ;) I closed your bug as a duplicate of the same report I created already earlier on openSUSE Krypton: https://bugzilla.suse.com/show_bug.cgi?id=1105207

Actions #42

Updated by okurz about 6 years ago

  • Has duplicate action #42830: [functional][u] test fails in kontact - akregator was launched at the first place added
Actions #43

Updated by okurz about 6 years ago

  • Subject changed from [opensuse][u][functional][sporadic][medium] test fails in kontact - akregator not closed to [opensuse][u][functional][sporadic][medium] test fails in kontact - needs workaround for boo#1105207, then akregator not closed
  • Status changed from Blocked to Workable
  • Assignee deleted (zluo)

the bug seems to not move forward, we should invest into a workaround, one can click the ok button of the error message with a soft fail and continue

Actions #44

Updated by zluo about 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

take over and check this for a workaround.

Actions #45

Updated by zluo about 6 years ago

need a test run against latest build and create a needle for close dialog with fatal error:

https://openqa.opensuse.org/tests/735133#step/kontact/15

Actions #47

Updated by mgriessmeier about 6 years ago

  • Status changed from In Progress to Feedback
Actions #48

Updated by zluo about 6 years ago

  • Status changed from Feedback to In Progress

http://e13.suse.de/tests/10470 shows fatal error, so trying now to use workaround and hope to get verified test run.

Actions #49

Updated by zluo about 6 years ago

http://e13.suse.de/tests/10533#step/kontact/16 shows the example where x11_start_program('killall kontact', valid => 0); doesn't work.

Actions #50

Updated by zluo about 6 years ago

http://e13.suse.de/tests/10533#step/kontact/16 shows the example: after killall kontact, akregator is still there

Now trying with:

record_soft_failure('akregator cannot be closed, related to issue of bsc#1105207') && return if (check_screen 'akregator-not-closed');

Actions #52

Updated by zluo about 6 years ago

http://e13.suse.de/tests/10638#step/kontact/11 shows performance issue, this is really weird with tying character, it starts Amarok.

Actions #53

Updated by zluo about 6 years ago

QEMURAM 1536 is not so much to run the whole tests...

Actions #54

Updated by zluo about 6 years ago

https://openqa.opensuse.org/tests/805158#step/kontact/15 shows fatal error, so this problem is not sporadic.

Actions #55

Updated by zluo about 6 years ago

PR updated again.

Actions #56

Updated by zluo about 6 years ago

  • Status changed from In Progress to Feedback

waiting for merge.

Actions #57

Updated by zluo almost 6 years ago

  • Status changed from Feedback to In Progress

from okurz:

I don't see how akregator is related to https://bugzilla.suse.com/show_bug.cgi?id=1105207 . Can we please split the concerns and you just focus on the kontact error and not akregator?

For easier testing have you considered https://progress.opensuse.org/issues/35589#note-35 to just schedule booting from an image and call kontact? If you not schedule "amarok" it should prevent the "akregator" problem

--

checking...

Actions #58

Updated by zluo almost 6 years ago

PR updated again.

Actions #59

Updated by zluo almost 6 years ago

  • Status changed from In Progress to Feedback
Actions #60

Updated by okurz almost 6 years ago

  • Priority changed from Normal to High
  • Target version changed from Milestone 21 to Milestone 22

median cycle time exceeded -> bumping prio and target version to current milestone

Actions #61

Updated by zluo almost 6 years ago

  • Status changed from Feedback to In Progress

working on this again and need to provide new verification run because the old results are gone after I re-installed my workstation.

Actions #62

Updated by zluo almost 6 years ago

needle PR got updated.

Actions #63

Updated by zluo almost 6 years ago

  • Status changed from In Progress to Feedback

set now as feedback

Actions #64

Updated by okurz almost 6 years ago

PR merged. Now we can focus again on the original issue of akregator showing up instead of kontact.

Actions #65

Updated by zluo almost 6 years ago

  • Status changed from Feedback to In Progress

check this again. I think akregator (news feed) needs to be close separately because we can start /usr/bin/akregator.
The original issue shows akregator stays opened however kontact is already gone.

Actions #66

Updated by zluo almost 6 years ago

zluo@f40:/var/lib/openqa/tests/opensuse/tests> cnf akregator
The program 'akregator' can be found in the following package:
* akregator [ path: /usr/bin/akregator, repository: zypp (download.opensuse.org-oss) ]
Try installing with:
sudo zypper install akregator
zluo@f40:/var/lib/openqa/tests/opensuse/tests> cnf kontact
The program 'kontact' can be found in the following package:
* kontact [ path: /usr/bin/kontact, repository: zypp (download.opensuse.org-oss) ]
Try installing with:
sudo zypper install kontact
Actions #67

Updated by okurz almost 6 years ago

Not sure what you want to tell with the latest two comments however the original problem was, as described in #35589#note-20 that akregator is started by a partially typed or mistyped command in the desktop runner. Immediately after the start of the test module "kontact" the test starts to type "a" for "akonadictl" in https://openqa.opensuse.org/tests/832330/file/video.ogv#t=306.21,306.25 for which krunner suggests amarok which then somehow consumes the next character "k" and the typing continues with "o", see https://openqa.opensuse.org/tests/832330/file/video.ogv#t=306.41,306.45 so it can only turn out wrong.

Actions #68

Updated by zluo almost 6 years ago

well, this kind of typing issue happens sometimes and we don't have a good idea to fix this which is more related to setup/performance. My tests runs don't show this issue. So it is very hard to reproduce.

Actions #69

Updated by zluo almost 6 years ago

of course we should try to make test module more robust and provide workaround if needed. In this case with Akonadi service, if akondadi service is not running, starting kontact should work anyway. If akregator got started wrongly by typing issue, then we cannot fix this, but try to check this error and provide softfail for that.
This is already provided by my PR.

Actions #70

Updated by okurz almost 6 years ago

zluo wrote:

If akregator got started wrongly by typing issue, then we cannot fix this

I am sure we can fix it. If this is only shown by our automated tests then we can either ensure that this is handled as a valid bug and fixed accordingly (I am not aware of any bug report) or at least provide a workaround if we can not find a fix.

If you can not reproduce it locally you can try out https://progress.opensuse.org/issues/44327 and trigger tests on production based on your local branch.

zluo wrote:

This is already provided by my PR.

Why PR? https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6226 ? It only aborts the test early on boo#1105207

If you do not have further plans for this ticket yourself it's ok to unassign.

https://openqa.opensuse.org/tests/835854#step/kontact/8 is one of the latest failures. The screenshot here directly shows the incorrectly typed string in krunner.

Actions #71

Updated by zluo almost 6 years ago

  • Assignee deleted (zluo)

well, typing issue is a general issue on osd. I won't be able to fix this. Please someone else can take over...

Actions #72

Updated by zluo almost 6 years ago

  • Status changed from In Progress to Workable
Actions #73

Updated by okurz almost 6 years ago

  • Blocks action #41540: [functional][u][sporadic] test fails in kontact as command "killall" is mistyped in x11_start_program (seems plasma specific problem) added
Actions #74

Updated by okurz almost 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz

Taking latest extratests_in_kde and triggering a custom schedule for gathering current fail rate:

env openqa-clone-set https://openqa.opensuse.org/tests/838225 poo35589_okurz_kde_wayland SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/amarok,tests/x11/kontact

https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland

and in parallel to try the "security mitigation off"-switches mentioned in https://bugzilla.opensuse.org/show_bug.cgi?id=1117833#c31

openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 838225 TEST=poo35589_okurz_kde_wayland_mitigation_off_001 BUILD=poo35589_okurz_kde_wayland_mitigation_off _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/amarok,tests/x11/kontact EXTRABOOTPARAMS="nopti nospec nospectre_v2 nospec_store_bypass_disable spectre_v2_user=off"

-> Created job #839521: opensuse-Tumbleweed-DVD-x86_64-Build20190124-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t839521

and latest retrigger based on rebased code: https://openqa.opensuse.org/tests/843506#step/start_wayland_plasma5/52

Fails to login in a stable way. It looks like sddm crashes the session on login (seen in the video) but also #46223 is impacting us as the "focussed" password prompt is not correctly detected. Blocked by #46223 and waiting for the results from #39926 as this is also about wayland.

Actions #75

Updated by okurz almost 6 years ago

  • Blocked by action #46223: [functional][u] test fails in user_gui_login - fails to re-login, password not typed or entry field not focussed? added
Actions #76

Updated by okurz almost 6 years ago

Hm, maybe I did something wrong on cloning because of not configuring the VM for virtio.

First a verification for the helper commits in https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6709 with

openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 845661 TEST=poo35589_okurz_kde_wayland_mitigation_off_001 BUILD=poo35589_okurz_kde_wayland_mitigation_off _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop EXTRABOOTPARAMS_BOOT_LOCAL="nopti nospec nospectre_v2 nospec_store_bypass_disable spectre_v2_user=off" _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1

Created job #846236: opensuse-Tumbleweed-DVD-x86_64-Build20190202-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t846236

Looks good. Now, onto loading the right modules again:

openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 845661 TEST=poo35589_okurz_kde_wayland_mitigation_off_001 BUILD=poo35589_okurz_kde_wayland_mitigation_off _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/amarok,tests/x11/kontact EXTRABOOTPARAMS_BOOT_LOCAL="nopti nospec nospectre_v2 nospec_store_bypass_disable spectre_v2_user=off" _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1

Created job #846305: opensuse-Tumbleweed-DVD-x86_64-Build20190202-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t846305

failed in amarok. Maybe it's better to start tests first with kontact only?

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 845661 TEST=poo35589_okurz_kde_wayland_mitigation_off_$i BUILD=poo35589_okurz_kde_wayland_mitigation_off _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact EXTRABOOTPARAMS_BOOT_LOCAL="nopti nospec nospectre_v2 nospec_store_bypass_disable spectre_v2_user=off" _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1 ; done

-> https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=poo35589_okurz_kde_wayland_mitigation_off

and for crosschecking with mitigation on:

$ for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 845661 TEST=poo35589_okurz_kde_wayland_kontact_only_$i BUILD=poo35589_okurz_kde_wayland_kontact_only _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1 ; done

-> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&distri=opensuse&build=poo35589_okurz_kde_wayland_kontact_only

Comparing failure rate in both and see if 1) mitigation off makes a difference 2) kontact is stable or fails even if it's the only module (might need krunner module for settle-down anyway).

Result

https://openqa.opensuse.org/tests/overview?distri=opensuse&version=Tumbleweed&build=poo35589_okurz_kde_wayland_mitigation_off failed in 17/100 jobs vs. https://openqa.opensuse.org/tests/overview?version=Tumbleweed&distri=opensuse&build=poo35589_okurz_kde_wayland_kontact_only failed in 25/100 jobs. Let's schedule 2x100 more for better statistics. I have the assumption though that the mitigation off helps however not all.

EDIT: That's now 39/200 -> 19.5% failure rate for mitigation off, 44/200 -> 22% failure rate for mitigation on -> no significant difference

Next steps

sysrich mentioned a potential relation to boo#1112824 because the Tumbleweed kernel as preemption enforced whereas on older versions and SLE and Leap we use voluntary preemption (or off? Something along those lines). Would be worth a try to run "kernel-vanilla" but I am not sure if we have a "test module" ensuring the installation of that package. Next step after that would be to rebuild the kernel with the changed setting and trying that one.

EDIT: We do have a test module to change the kernel, tests/kernel/change_kernel.pm

for i in {001..001}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 845661 TEST=poo35589_okurz_kde_wayland_mitigation_offkernel-vanilla_$i BUILD=poo35589_okurz_kde_wayland_mitigation_off_kernel-vanilla _GROUP=0 SCHEDULE=tests/kernel/change_kernel,tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact EXTRABOOTPARAMS_BOOT_LOCAL="nopti nospec nospectre_v2 nospec_store_bypass_disable spectre_v2_user=off" _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1 CHANGE_KERNEL_REPO=https://download.opensuse.org/repositories/Kernel:/stable/standard/ CHANGE_KERNEL_PKG=kernel-vanilla; done

Created job #846861: opensuse-Tumbleweed-DVD-x86_64-Build20190202-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t846861 -> passed so this approach works. 12 minutes vs. 8 minutes runtime. Let's schedule more of these:

for i in {002..200}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 845661 TEST=poo35589_okurz_kde_wayland_mitigation_offkernel-vanilla_$i BUILD=poo35589_okurz_kde_wayland_mitigation_off_kernel-vanilla _GROUP=0 SCHEDULE=tests/kernel/change_kernel,tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact EXTRABOOTPARAMS_BOOT_LOCAL="nopti nospec nospectre_v2 nospec_store_bypass_disable spectre_v2_user=off" _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1 CHANGE_KERNEL_REPO=https://download.opensuse.org/repositories/Kernel:/stable/standard/ CHANGE_KERNEL_PKG=kernel-vanilla; done

-> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&distri=opensuse&build=poo35589_okurz_kde_wayland_mitigation_off_kernel-vanilla

EDIT: 40/200 failed -> 20% failure rate -> no significant difference, "kernel-default" is not worse than "kernel-vanilla"

How about the same as above with default kernel and no changes to mitigation with Leap 15.0:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 844165 TEST=poo35589_okurz_kde_wayland_kontact_only_leap151_$i BUILD=poo35589_okurz_kde_wayland_kontact_only_leap151 _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1 ; done

-> https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland_kontact_only_leap151&version=15.1&distri=opensuse

Actions #77

Updated by okurz almost 6 years ago

  • Parent task set to #35302
Actions #78

Updated by okurz almost 6 years ago

The last check on Leap15.1 did not work. All 100 jobs failed in https://openqa.opensuse.org/tests/847346#step/boot_to_desktop/5 to boot to a graphical desktop although the serial port shows that after 700s there are still some services responding with output. Currently no good idea what is wrong here.

Crosschecking as suggested in https://bugzilla.opensuse.org/show_bug.cgi?id=1112824#c143

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 847596 TEST=poo35589_okurz_kde_wayland_mitigation_off_kernel-twvolun_$i BUILD=poo35589_okurz_kde_wayland_mitigation_off_kernel-twvolun _GROUP=0 SCHEDULE=tests/kernel/change_kernel,tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact EXTRABOOTPARAMS_BOOT_LOCAL="nopti nospec nospectre_v2 nospec_store_bypass_disable spectre_v2_user=off" _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1 CHANGE_KERNEL_REPO=https://download.opensuse.org/repositories/home:/favogt:/twvolun/standard/; done

-> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_mitigation_off_kernel-twvolun&version=Tumbleweed

EDIT: 17/99 failed (1 incomplete) -> 17% failure rate -> no significant difference

TODO: try to test successfully on Leap, any version. I have the feeling that we are using krunner too much in kontact. We could make that more stable with more waiting or using e.g. the user-console although that is most likely not faster because of switching and console enabling and such. Even worse is probably xterm where we set the prompt and disable the serial terminal and such multiple times.

Actions #79

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 22 to Milestone 23
Actions #80

Updated by okurz almost 6 years ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6942 created to improve robustness of start_wayland_plasma5 as a partial improvement of a side-failure.

So based on above results I come to the conclusion that we should either type even slower in krunner in the wayland scenario or use an xterm. However, for failures like https://openqa.opensuse.org/tests/848413#step/kontact/18 where we fail to type a single word correctly I doubt that xterm is really any better. So let's try to type slower in krunner@wayland.

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #866260: opensuse-Tumbleweed-DVD-x86_64-Build20190226-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t866260 -> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_very_slow_krunner&version=Tumbleweed

Actions #81

Updated by okurz almost 6 years ago

Still failed in some instances, e.g. https://openqa.opensuse.org/tests/866262#step/kontact/8 mentioning "ho" instead of "echo" so I added a wait_still_screen(3) for the case of WAYLAND in init_desktop_runner:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen&version=Tumbleweed

Actions #82

Updated by okurz almost 6 years ago

Still some instances of incomplete strings. https://openqa.opensuse.org/tests/866980/file/video.ogv#t=27.30,27.31 shows that of the word "kontact" only "nt…" is showing up in the krunner dialog and the first two characters ended up with looks like yet another, previous krunner dialog. Trying with substrings.

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact _SKIP_POST_FAIL_HOOKS=1 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split&distri=opensuse&version=Tumbleweed

6/99 failed, that's significantly less. However the failures in most cases look like still the same, e.g. some characters are typed in a krunner dialog, the next characters end up in a krunner dialog which pops up in a different location. I assume krunner is really crashing but nevertheless I guess we need a workaround.

To try one more time something different, typing one character, waiting, second character, waiting, typing rest plus collect logs from post_fail_hook:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> Created job #868058: opensuse-Tumbleweed-DVD-x86_64-Build20190226-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t868058 -> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs&distri=opensuse

Actions #83

Updated by okurz almost 6 years ago

That's funny, when we fail, we reproducibly fail to collect and upload logs as in https://openqa.opensuse.org/tests/868131#step/kontact/38 showing only the help for "ar" when it should be "tar …".

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #869814: opensuse-Tumbleweed-DVD-x86_64-Build20190226-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t869814 ->
https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs&distri=opensuse&version=Tumbleweed

Actions #84

Updated by okurz almost 6 years ago

Still missing the first character of "tar" in https://openqa.opensuse.org/tests/868131#step/kontact/38 . Writing an explicit single whitespace in export_kde_logs.

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #869914: opensuse-Tumbleweed-DVD-x86_64-Build20190226-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t869914 -> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace&distri=opensuse

Actions #85

Updated by okurz almost 6 years ago

https://openqa.opensuse.org/tests/870044 has more logs now but could not find any "coredumps" or similar. The krunnerrc consists of:

[General]
history=kontact,echo "first-start=false" >> ~/.config/kmail2rc,echo "first-start=false" >> ~/.kde4/share/config/kmail2rc,ho "[General]" >> ~/.kde4/share/config/kmail2rc,ec,akonadictl start,xterm

[PlasmaRunnerManager]
LaunchCounts=1 services_xterm.desktop

so no "plugins" explicitly enabled but also interesting that the characters "ec" of the last "echo" are showing in the "history" variable, separated from "ho …"

Further ideas: Call check_desktop_runner in start_wayland_plasma5 explicitly as well as an explicit test module for krunner which types the potentially problematic string "echo":

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace_plus_krunner_test _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #870056: opensuse-Tumbleweed-DVD-x86_64-Build20190226-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t870056 -> https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace_plus_krunner_test&distri=opensuse&version=Tumbleweed

Could not find anything obvious on logs. Of course, quite some messages that look like error messages but I am not sure which one would point to the problem I observe and I guess there is no use in me reporting openSUSE bugs about any of the specific issues. Could be upstream bugs however that's not my primary concern now.

On a VM running openSUSE Leap 15.1 staring "plasma (wayland)" I could not reproduce the problems manually but I also wanted to try out what happens when I try to crash krunner. Starting krunner in konsole and calling pkill -SIGSEGV krunner causes the krash dialog with the frowning smiley in systray to show up and also console messages state "KCrash: Attemping to start /usr/bin/krunner from kdeinit" and "KCrash: Application 'krunner' crashing..." so an explicit message. Nothing I have seen in logs. What I have learned is that in the xsession-errors log, e.g. in https://openqa.opensuse.org/tests/870044/file/kontact-XSE.log the message "Using Wayland-EGL" should point to krunner starting initially. Whenever I open krunner in a VM manually I see another error message about an unexpected attribute on a virtual screen but none of that kind for the openQA VMs.

The web research results are very limited so far, best match https://opensuse.opensuse.narkive.com/OkKXNi2z/aargh-why-does-krunner-keep-disappearing suggesting to disable plugins. https://lists.opensuse.org/opensuse/2010-03/msg00039.html mentions how I could debug. If disabling plugins by config file does not work maybe I can delete some files

Trying to disable krunner plugins:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 864281 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace_plus_krunner_test_krunner_plugins_disabled _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> Created job #870840: opensuse-Tumbleweed-DVD-x86_64-Build20190226-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t870840 -> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace_plus_krunner_test_krunner_plugins_disabled&distri=opensuse

Seems a lot of jobs are incomplete now because of missing assets, let's try based on more recent base job 870613:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace_plus_krunner_test_krunner_plugins_disabled _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #871054: opensuse-Tumbleweed-DVD-x86_64-Build20190305-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t871054 -> https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland_very_slow_krunner_wait_still_screen_split_twice_with_logs_fixed_export_kde_logs_type_explicit_whitespace_plus_krunner_test_krunner_plugins_disabled&distri=opensuse&version=Tumbleweed

all incomplete well, not sure why.

A retriggered job https://openqa.opensuse.org/tests/8713700 works ok still so something with the code? Why no autoinst-log.txt then?

I have the suspicion that the build string is getting too long :D

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_slow_krunner_plugins_disabled _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=poo35589_okurz_kde_wayland_slow_krunner_plugins_disabled&distri=opensuse

same fail ratio, still failing like in
https://openqa.opensuse.org/tests/871386/file/video.ogv#t=32.08,32.14
Only shell commands should be enabled but it looks like some software center suggestions are still there. I guess I need to recreate krunnerrc from a more recent base system, Trying in TW and Krypton VM.

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_slow_more_krunner_plugins_disabled _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #871511: opensuse-Tumbleweed-DVD-x86_64-Build20190305-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t871511 -> https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland_slow_more_krunner_plugins_disabled&distri=opensuse&version=Tumbleweed

still failing to type "first-start" stuff not that stable

We should type less commands in kontact:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_slow_krunner_plugins_disabled_less_kontact_commands _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner2 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> Created job #871671: opensuse-Tumbleweed-DVD-x86_64-Build20190305-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t871671 -> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=poo35589_okurz_kde_wayland_slow_krunner_plugins_disabled_less_kontact_commands&distri=opensuse

This reduced the fail ratio to 4/100 failed. In all four cases the problem is that the "echo" command is ineffective due to an incorrect string in krunner, e.g. as visible in https://openqa.opensuse.org/tests/871752/file/video.ogv#t=19.42,19.46 we see the incomplete string e "[Ge instead of expected echo "[Ge. Should we really type that command in the user console instead?

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_kontact_commands_user_console_skip_krunner _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner4 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #871811: opensuse-Tumbleweed-DVD-x86_64-Build20190305-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t871811 -> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_kontact_commands_user_console_skip_krunner&version=Tumbleweed

Hm, still failing to type "kontact" or "killall" correctly, same fail rate.

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_krunner_plugins_disabled_user_console_match_typed _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner4 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #871951: opensuse-Tumbleweed-DVD-x86_64-Build20190305-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t871951 -> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_krunner_plugins_disabled_user_console_match_typed&version=Tumbleweed

Added another 100 to the set. All 200 passed now \o/ That is 38fafc7c5 in fix/krunner4

Wanted to try out with the "match_typed" parameter but actually made the mistake to not actually commit it. So actually there should be no difference to before when it was last failing. Adding another 200 to the set.

I could try to enable more debug info for krunner, e.g. with kdebugsettings, but manually that does not bring so much more information.

As 400 (!) jobs pass now I want to crosscheck if I need all the changes, partially reverting, e.g. no string splitting, no krunner module, no wait_still_screen:

for i in {001..40}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_user_console_rest_reverted _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner5 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #872359: opensuse-Tumbleweed-DVD-x86_64-Build20190305-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t872359 -> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&distri=opensuse&build=poo35589_okurz_kde_wayland_user_console_rest_reverted

1/40 failed. Keeping the string splitting + wait_still_screen:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 870613 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_user_console_rest_reverted2 _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner5 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #872830: opensuse-Tumbleweed-DVD-x86_64-Build20190305-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t872830 -> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&distri=opensuse&build=poo35589_okurz_kde_wayland_user_console_rest_reverted2

Still 4/41 failed. I should stay on fix/krunner4 for now

How does the the current "kde-wayland" scenario fare based on fix/krunner4:

for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 872655 TEST=poo35589_okurz_kde_wayland_base_$i BUILD=poo35589_okurz_kde_wayland_base_fix_krunner4 _GROUP=0 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner4; done

-> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_base_fix_krunner4&version=Tumbleweed

Some additional fails in firefox_audio but it's hard to see as the modules "ooffice, oocalc, oomath" always fail. I should exclude them. Also, should run these longer scenarios with reduced prio:

build=poo35589_okurz_kde_wayland_base_fix_krunner4; for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 872655 TEST=poo35589_okurz_kde_wayland_base_$i BUILD=$build _GROUP=0 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner4 _SKIP_POST_FAIL_HOOKS=1 EXCLUDE_MODULES=ooffice,oomath,oocalc; done ; for i in $(openqa_client_o3 --json-output jobs build=$build state=scheduled | jq '.jobs | .[] | .id') ; do openqa_client_o3 jobs/$i put --json-data '{"priority": 90}'; done

-> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_base_fix_krunner4&version=Tumbleweed

Actions #86

Updated by okurz over 5 years ago

  • Status changed from In Progress to Feedback

Waiting for review and merge of https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6943 using the branch "fix/krunner2".

However, latest test results based on fix/krunner4 show that all that is not enough, many failures in kontact and other modules. I should probably try to create a proper PR based on fix/krunner4 as well and schedule with the schedule updates, e.g. including the krunner module and disabling plugins:

build=poo35589_okurz_kde_wayland_base_fix_krunner5; for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 877711 TEST=poo35589_okurz_kde_wayland_base_$i BUILD=$build _GROUP=0 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner5 _SKIP_POST_FAIL_HOOKS=1 EXCLUDE_MODULES=ooffice,oomath,oocalc PRODUCTDIR=os-autoinst-distri-opensuse/products/opensuse NEEDLES_DIR=/var/lib/openqa/cache/openqa1-opensuse/tests/opensuse/products/opensuse/needles ; done ; for i in $(openqa_client_o3 --json-output jobs build=$build state=scheduled | jq '.jobs | .[] | .id') ; do openqa_client_o3 jobs/$i put --json-data '{"priority": 90}'; done

-> https://openqa.opensuse.org/tests/overview?version=Tumbleweed&build=poo35589_okurz_kde_wayland_base_fix_krunner5&distri=opensuse

Actions #87

Updated by okurz over 5 years ago

  • Status changed from Feedback to In Progress

First part merged. After discussing with mgriessmeier and favogt I have another tiny idea: Checking that the desktop runner is there after every single character:

for i in {001..001}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 880149 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_check_border _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/krunner,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner6 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> https://openqa.opensuse.org/tests/880166

This seems to break krunner completely.

for i in {001..001}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 880149 TEST=poo35589_okurz_kde_wayland_$i BUILD=poo35589_okurz_kde_wayland_check_border _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner6 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

-> https://openqa.opensuse.org/tests/880219

This looks promising, no fail, easy to understand screenshots, single screenshot for every character. Not necessarily what we want in all cases but let's see if this helps for debugging or as fix :)

let's try a combined small and big set:

build=poo35589_okurz_kde_wayland_check_border; for i in {002..200}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 880149 TEST=poo35589_okurz_kde_wayland_$i BUILD=$build _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner6 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done ; for i in $(openqa_client_o3 --json-output jobs build=$build state=scheduled | jq '.jobs | .[] | .id') ; do openqa_client_o3 jobs/$i put --json-data '{"priority": 90}'; done; build=poo35589_okurz_kde_wayland_check_border_krunner6 ; for i in {001..100}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 880149 TEST=poo35589_okurz_kde_wayland_base_$i BUILD=$build _GROUP=0 CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner6 _SKIP_POST_FAIL_HOOKS=1 EXCLUDE_MODULES=ooffice,oomath,oocalc PRODUCTDIR=os-autoinst-distri-opensuse/products/opensuse NEEDLES_DIR=/var/lib/openqa/cache/openqa1-opensuse/tests/opensuse/products/opensuse/needles ; done ; for i in $(openqa_client_o3 --json-output jobs build=$build state=scheduled | jq '.jobs | .[] | .id') ; do openqa_client_o3 jobs/$i put --json-data '{"priority": 90}'; done

Created job #880235: opensuse-Tumbleweed-DVD-x86_64-Build20190313-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t880235 -> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_check_border&version=Tumbleweed

and https://openqa.opensuse.org/tests/overview?build=poo35589_okurz_kde_wayland_check_border_krunner6&distri=opensuse&version=Tumbleweed -> is not what I wanted because it should have been the original kde-wayland scenario, not extra_tests_in_kde. However, it still gives valuable information about the (in-)stability of mainly gnucash failing in 38/100 scenarios. Created tickets for chrome #49361, wine #49358, gnucash #49355

Back to the previous problem: https://openqa.opensuse.org/tests/880241#step/kontact/18 shows the characters "ko" of "kontact" to be typed correctly, the following screen https://openqa.opensuse.org/tests/880241#step/kontact/19 shows "n" but with the two previous characters lost. So krunner vanished in between and somehow reappeared. Anyone has an idea?

Actions #88

Updated by okurz over 5 years ago

  • Status changed from In Progress to Feedback

I asked in #opensuse-kde if anyone has an idea as well. I am a bit out of ideas :)

EDIT: Suggestion from fvogt: With export QT_LOGGING_RULES=*.debug=true it'll log every single event, but it might also slow it down enough that it works - still worth a try

Actions #89

Updated by okurz over 5 years ago

  • Priority changed from High to Normal
[15/03/2019 15:54:40] <okurz> DimStar, coolo: I haven't realized so far that oocalc/ooffice looks better, https://openqa.opensuse.org/tests/880736 kde-wayland@virtio is soft-failed. Not seen that often ;) I hope I could improve "kontact" stability recently as well
Actions #90

Updated by okurz over 5 years ago

  • Target version changed from Milestone 23 to Milestone 25
build=poo35589_okurz_kde_wayland_qt_log; for i in {001..001}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 881292 TEST=poo35589_okurz_kde_wayland_$i BUILD=$build _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#fix/krunner6 MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #883005: opensuse-Tumbleweed-DVD-x86_64-Build20190315-extra_tests_on_kde@64bit -> https://openqa.opensuse.org/t883005 -> https://openqa.opensuse.org/tests/overview?distri=opensuse&build=poo35589_okurz_kde_wayland_qt_log&version=Tumbleweed

[18/03/2019 17:22:45] <okurz> fvogt: so in https://openqa.opensuse.org/tests/883227#step/kontact/52 you can see that we type "k" of "killall" in krunner before krunner vanishes. https://openqa.opensuse.org/tests/883227/file/kontact-XSE.log is the complete logfile that should include everything from QT_LOGGING_RULES="*.debug=true". Maybe you can find something?
[18/03/2019 17:23:08] <fvogt> okurz: Yay, I'll have a quick look now and a deeper look later

then:

  • fvogt When the "i" is being types, the window state changes to hidden and then after a while there's a sudden size change from 532x647 to 532x47 The only log message during the window state transition is "org.kde.kactivities.lib.core: Killing the consumer". Whatever that means... that's at least as much logging as we can get out of krunner. WAYLAND_DEBUG=1 would produce even more, but we already know that the window state changes. Meh, the message is nothing of any value, just a destructor: https://github.com/KDE/kactivities/blob/0d6245995b43f8b1927b0e0c52859b5ebb3c2e19/src/lib/consumer.cpp#L64
  • okurz So just loosing focus of the window?
  • fvogt Probably, but there's nothing about that either. It would explain why it disappears and reappears without content though
Actions #92

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: kde-wayland
https://openqa.opensuse.org/tests/907714

Actions #93

Updated by okurz over 5 years ago

  • Assignee changed from okurz to mgriessmeier

Move to new QSF-u PO after I moved to the "tools"-team. I mainly checked the subject line so in individual instances you might not agree to take it over completely into QSF-u. Feel free to discuss with me or reassign to me or someone else in this case. Thanks.

Actions #94

Updated by okurz over 5 years ago

  • Status changed from Feedback to Resolved
  • Assignee changed from mgriessmeier to okurz

I think this is fixed. I checked the history of kde@64bit as well as kde-wayland@virtio and I see no failures in kontact since 3 months

Actions #95

Updated by okurz over 5 years ago

  • Status changed from Resolved to Feedback

seems some people still have problems with whatever is the current approach, let's track this further.

Actions #96

Updated by okurz over 5 years ago

  • Related to action #51944: [opensuse][functional][u] test fails in dolphin -- "kdialog --getopenfilename" fails to start added
Actions #97

Updated by okurz over 5 years ago

  • Related to action #53045: [opensuse][kde][sporadic] krunner suggestions check is racy added
Actions #98

Updated by okurz over 5 years ago

Followup to PR by others:

build=poo35589_okurz_kde_wayland_qt_log; for i in {001..001}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 883005 TEST=poo35589_okurz_kde_wayland_$i BUILD=$build _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#enhance/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #963260: opensuse-Tumbleweed-DVD-x86_64-Buildpoo35589_okurz_kde_wayland_qt_log-poo35589_okurz_kde_wayland_001@64bit_virtio -> https://openqa.opensuse.org/t963260

Actions #99

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 25 to Milestone 26

okurz wrote:

Followup to PR by others:

build=poo35589_okurz_kde_wayland_qt_log; for i in {001..001}; do openqa-clone-job --skip-chained-deps --within-instance https://openqa.opensuse.org 883005 TEST=poo35589_okurz_kde_wayland_$i BUILD=$build _GROUP=0 SCHEDULE=tests/boot/boot_to_desktop,tests/x11/start_wayland_plasma5,tests/x11/krunner,tests/x11/kontact CASEDIR=https://github.com/okurz/os-autoinst-distri-opensuse#enhance/krunner MACHINE=64bit_virtio QEMUVGA=virtio WAYLAND=1; done

Created job #963260: opensuse-Tumbleweed-DVD-x86_64-Buildpoo35589_okurz_kde_wayland_qt_log-poo35589_okurz_kde_wayland_001@64bit_virtio -> https://openqa.opensuse.org/t963260

this incompletes - did you trigger another one?

Actions #100

Updated by okurz over 5 years ago

  • Blocked by action #53339: [opensuse] test fails in swing due to incorrect rendering on 16bpp framebuffers added
Actions #101

Updated by okurz over 5 years ago

  • Subject changed from [opensuse][u][functional][sporadic][medium] test fails in kontact - needs workaround for boo#1105207, then akregator not closed to [opensuse][sporadic][medium] test fails in kontact - needs workaround for boo#1105207, then akregator not closed
  • Status changed from Feedback to Blocked

no, not yet. I plan to check the stability of the tests after proceeding with #53339 which can have quite some impact. I guess I can take it for the time being and therefore bring it outside QSF-u. I guess you appreciate :)

Actions #102

Updated by mgriessmeier over 5 years ago

  • Target version changed from Milestone 26 to Milestone 28
Actions #103

Updated by okurz about 5 years ago

  • Subject changed from [opensuse][sporadic][medium] test fails in kontact - needs workaround for boo#1105207, then akregator not closed to [functional][u][opensuse][sporadic][medium] test fails in kontact - needs workaround for boo#1105207, then akregator not closed
  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

blocker resolved, back to QSF-u

Actions #104

Updated by zluo almost 5 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

let me check current status of kontact.

Actions #105

Updated by zluo almost 5 years ago

  • Blocks deleted (action #41540: [functional][u][sporadic] test fails in kontact as command "killall" is mistyped in x11_start_program (seems plasma specific problem))
Actions #106

Updated by zluo almost 5 years ago

I checked on o3 for tw, atm we don't have any issue with kontact.

trigger 50 test run on f40.suse.de and see whether I can reproduce any issue with it.

Actions #107

Updated by zluo almost 5 years ago

check_bsc982138; of start_install.pm is not necessary since the production issue got fixed.

Actions #108

Updated by zluo almost 5 years ago

found issue with bootloader, this is quite strange:

http://f40.suse.de/tests/5844#step/bootloader/1 bootmenu-TW-xmas-20191209 matched but then the test failed :((

http://f40.suse.de/tests/5846#step/bootloader/1 successful

Actions #109

Updated by zluo almost 5 years ago

  • Status changed from In Progress to Rejected

So reject this ticket for now because this issue doesn't exist anymore on o3 or on my test runs:

http://f40.suse.de/tests/5846#next_previous

Actions

Also available in: Atom PDF