Project

General

Profile

Actions

action #30805

closed

[functional][opensuse][leap][medium][u] first test after reboot fails in krunner, potential system overload (was: test fails in inkscape - typing too fast?)

Added by JERiveraMoya about 6 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 18
Start date:
2018-01-25
Due date:
2018-08-14
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.0-DVD-x86_64-update_Leap_42.3_kde@64bit fails in
inkscape
Typing is too fast and the xterm text typed to search is mixed with the bash command in the xterm, enter key probably does not arrive in the right moment. Fix the test reducing typing speed or searching for needle between two steps for example would be needed to avoid this error.

Tasks

  • Try to reproduce issue
  • Find the possible flow which leads to behavior in failed job and improve code (e.g. by using match_typed for ensure_installed on KDE)

Reproducible

Fails since (at least) Build 106.1 (current job)

Expected result

Last good: 105.1 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 6 (0 open6 closed)

Related to openQA Tests - action #33283: [opensuse][functional][u][sporadic][medium] test fails in kontact - typing string loosing charactersResolvedzluo2018-03-142018-05-08

Actions
Related to openQA Tests - action #31687: [opensuse][functional][medium][u] x11_start_program does not care if program can not be called in desktop runner even after three 'ret' presses with "valid" and "target_match"Resolvedjorauch2018-02-122018-04-24

Actions
Related to openQA Tests - action #35877: [functional][u] Find out in post-fail-hook if system is I/O-busyResolvedzluo2018-05-03

Actions
Related to openQA Tests - action #33571: [opensuse][functional][u][medium] test fails in shutdown - emoticon settings are openedResolved2018-03-212018-06-05

Actions
Related to openQA Tests - action #35688: [opensuse][functional][u][sporadic][bsc#1091353][medium] Various unstable tests on o3 - inkscapeResolvedokurz2018-04-30

Actions
Blocked by openQA Tests - action #36117: [functional][u][sporadic] test fails in xterm to show "xterm" (needle tag desktop-runner-plasma-suggestions) in krunner - system slower just after login?Resolvedokurz2018-05-132018-06-19

Actions
Actions #1

Updated by okurz about 6 years ago

  • Subject changed from test fails in inkscape to [functional][opensuse][leap]test fails in inkscape - typing too fast?
  • Due date set to 2018-03-13
  • Target version set to Milestone 14
Actions #2

Updated by riafarov about 6 years ago

  • Description updated (diff)
  • Status changed from New to Workable
Actions #3

Updated by mgriessmeier about 6 years ago

  • Subject changed from [functional][opensuse][leap]test fails in inkscape - typing too fast? to [functional][opensuse][leap][medium][research] test fails in inkscape - typing too fast?
Actions #4

Updated by jorauch about 6 years ago

  • Assignee set to jorauch

Has not been seen since then
Current build
https://openqa.opensuse.org/tests/621710
Triggered 50 runs on pinky to generate statistics -> http://pinky.arch.suse.de/tests

Actions #5

Updated by jorauch about 6 years ago

  • Assignee deleted (jorauch)
Actions #6

Updated by jorauch about 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to jorauch

Appeared a few times, would suggest reducing the typing limit in ensure_installed / x11_start_program (not sure which one applies here)

Actions #7

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-03-13 to 2018-03-27
  • Target version changed from Milestone 14 to Milestone 15
Actions #8

Updated by jorauch about 6 years ago

Add inkscape test to suite that boots from image (extratests in kde) and trigger 50 times to reproduce
Most likely in x11_start_program

Actions #9

Updated by okurz about 6 years ago

  • Related to action #33283: [opensuse][functional][u][sporadic][medium] test fails in kontact - typing string loosing characters added
Actions #10

Updated by jorauch about 6 years ago

Actions #11

Updated by mgriessmeier about 6 years ago

  • Related to action #31687: [opensuse][functional][medium][u] x11_start_program does not care if program can not be called in desktop runner even after three 'ret' presses with "valid" and "target_match" added
Actions #12

Updated by jorauch about 6 years ago

  • Status changed from In Progress to Blocked

Setting to blocked by:
https://progress.opensuse.org/issues/31687
Reason:
We have no real check if the entered text is actually what we wanted to enter and even if we knew it x11_start_program lacks a proper error handling.
Will try to find a proper solution in blocker ticket, then this should become obsolete

Actions #13

Updated by jorauch about 6 years ago

Created PR with code:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4715

Currently needle editor does not work for me, so needles and verification run need to be done

Actions #14

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-03-27 to 2018-04-10
Actions #15

Updated by okurz about 6 years ago

jorauch wrote:

Currently needle editor does not work for me, so needles and verification run need to be done

mkittler already provided patches. You could try these.

Actions #16

Updated by mgriessmeier about 6 years ago

  • Due date changed from 2018-04-10 to 2018-04-24
Actions #17

Updated by mgriessmeier almost 6 years ago

  • Due date changed from 2018-04-24 to 2018-05-08
  • Status changed from Blocked to Workable
  • Target version changed from Milestone 15 to Milestone 16

blocker has been resolved, moving to next sprint as workable

Actions #18

Updated by okurz almost 6 years ago

  • Subject changed from [functional][opensuse][leap][medium][research] test fails in inkscape - typing too fast? to [functional][opensuse][leap][medium][u] test fails in inkscape - typing too fast?

The expected result is clearly that the test runs stable so removing the [research] tag even though it might involve some research, just research is not enough.

Actions #19

Updated by jorauch almost 6 years ago

  • Status changed from Workable to In Progress

I see two options here:

  1. we merge the WIP PR and create a needle for the typed text
  2. we are happy that it did not happen for a long time and close this without merging the PR
Actions #20

Updated by jorauch almost 6 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from jorauch to okurz

What do you think?

Actions #21

Updated by okurz almost 6 years ago

  • Status changed from Feedback to In Progress
  • Assignee changed from okurz to jorauch

jorauch wrote:

I see two options here:

  1. we merge the WIP PR and create a needle for the typed text

We will merge the PR only after it is not WIP anymore, obviously. The question there still holds: What makes inkscape special?

  1. we are happy that it did not happen for a long time and close this without merging the PR

That is not true because I could easily find a recent failure: https://openqa.opensuse.org/tests/664611#step/inkscape/4

Actions #22

Updated by jorauch almost 6 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from jorauch to okurz
jorauch wrote:
I see two options here:
we merge the WIP PR and create a needle for the typed text
We will merge the PR only after it is not WIP anymore, obviously. The question there still holds: What makes inkscape special?

Appearantly it's special because its failing regularly and we have a ticket for it.
When comparing the recent fail you posted with the inital issue we can see that it fails in different steps, but both are caused by x11_start_program
Maybe we should go a step back and try to harden x11_start_program before we fix symptoms all around our tests?

Actions #23

Updated by jorauch almost 6 years ago

  • Related to action #35877: [functional][u] Find out in post-fail-hook if system is I/O-busy added
Actions #24

Updated by jorauch almost 6 years ago

  • Status changed from Feedback to In Progress
  • Assignee changed from okurz to jorauch

Talked with okurz.

Our assumption: The upgrade systems have a "dirty" filesystem causing e.g. btrfs maintenance tasks to slow down the systems more than clean installation jobs. Then random test modules fail, e.g. inkscape.

To separate the concerns we have the following ideas:

  • Add new test suite "dirty system test" with normal timeout and explicit check for system responsiveness, e.g. starting krunner very often and check for the suggestions to popup (best directly after reboot)
  • update tests with TIMEOUT_SCALE=3 to exclude workload problems
Actions #25

Updated by lnussel almost 6 years ago

the force_cron_run test is meant to take care of triggering all cron jobs so they don't disturb later.

Actions #26

Updated by okurz almost 6 years ago

  • Status changed from In Progress to Feedback
  • Assignee changed from jorauch to okurz

lnussel wrote:

the force_cron_run test is meant to take care of triggering all cron jobs so they don't disturb later.

That's true, but still we see the update cases are more prone to fail for reason yet unknown. Could also be that the krunner itself needs more time to rebuild some cache or so. This is why I think #35877 could help. Don't you think?

Actions #27

Updated by jorauch almost 6 years ago

  • Status changed from Feedback to In Progress
  • Assignee changed from okurz to jorauch

Additionally the force_cron_run is useless if we reboot in the meantime

Actions #28

Updated by mgriessmeier almost 6 years ago

  • Subject changed from [functional][opensuse][leap][medium][u] test fails in inkscape - typing too fast? to [functional][opensuse][leap][medium][u] first test after reboot fails in krunner, potential system overload (was: test fails in inkscape - typing too fast?)
  • Due date changed from 2018-05-08 to 2018-05-22
Actions #29

Updated by jorauch almost 6 years ago

We could either:

  • run force_cron_run after every reboot
  • change the complete order so there is no reboot in between
  • add a force_cron_run or wait to the reboot module

I'd prefer to just change the order since the reboot is not necessary for any of the following tests, but we would lose a snapshot.

Actions #30

Updated by okurz almost 6 years ago

  • Related to action #33571: [opensuse][functional][u][medium] test fails in shutdown - emoticon settings are opened added
Actions #31

Updated by jorauch almost 6 years ago

As discussed with okurz we should move the reboot before the shutdown
This is located in main_common.pm

Actions #32

Updated by jorauch almost 6 years ago

  • Status changed from In Progress to Feedback

PR created:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5000

Waiting for merge and consequences in production

Actions #33

Updated by okurz almost 6 years ago

validation test was fine, PR merged. Triggered 200 test jobs with

for i in {1..100} ; do openqa_clone_job_o3 --skip-chained-deps 671977 TEST=okurz_poo30805_$i BUILD=241.1:poo30805 _GROUP="Development Leap" ; done
for i in {1..100} ; do openqa_clone_job_o3 --skip-chained-deps 672079 TEST=okurz_poo30805_$i BUILD=241.1:poo30805 _GROUP="Development Leap" ; done

-> https://openqa.opensuse.org/tests/overview?build=241.1%3Apoo30805&version=15.0&distri=opensuse&groupid=39

Please check statistics.

Actions #34

Updated by jorauch almost 6 years ago

There are a lot of obsolted, not sure what this status means?
In the overview there are no inkscape fails at least.
Can we close this or are the obsoleted a problem?

Actions #35

Updated by okurz almost 6 years ago

  • Related to action #35688: [opensuse][functional][u][sporadic][bsc#1091353][medium] Various unstable tests on o3 - inkscape added
Actions #36

Updated by okurz almost 6 years ago

I would not close this bug yet. I am working on #36117 and my idea is to create an explicit test module calling the desktop runner, see https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5089
Afterwards we should again run some more tests to check statistics.

btw, "obsoleted" means that a new build was triggered and that canceled further execution of jobs in older builds. For our purposes you can ignore these and we can take a look on passed vs. failed. However, many tests failed in modules like yast2_lan which might be helpful for the future to make tests more stable but let's focus on the x11 test modules for now.

So I suggest to wait for #36117 first and then come back to this one here.

Actions #37

Updated by mgriessmeier almost 6 years ago

  • Blocked by action #36117: [functional][u][sporadic] test fails in xterm to show "xterm" (needle tag desktop-runner-plasma-suggestions) in krunner - system slower just after login? added
Actions #38

Updated by mgriessmeier almost 6 years ago

  • Due date changed from 2018-05-22 to 2018-06-05
  • Status changed from Feedback to Blocked
Actions #39

Updated by mgriessmeier almost 6 years ago

  • Due date changed from 2018-06-05 to 2018-06-19
  • Status changed from Blocked to Workable
  • Target version changed from Milestone 16 to Milestone 17

blocker resolved, moving to workable into next sprint (to be revisited in planning meeting)

Actions #40

Updated by mgriessmeier almost 6 years ago

  • Due date deleted (2018-06-19)
Actions #41

Updated by okurz almost 6 years ago

  • Due date set to 2018-07-03

Please try to check statistics again.

Actions #42

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 17 to Milestone 17
Actions #43

Updated by okurz almost 6 years ago

  • Due date changed from 2018-07-03 to 2018-08-14
  • Status changed from Workable to Blocked
  • Assignee changed from jorauch to okurz
  • Target version changed from Milestone 17 to Milestone 18

no reaction. Feel free to unassign sooner, no problem to give tickets back to the backlog. As well as in other tickets, blocked by #31351

Actions #44

Updated by okurz almost 6 years ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

With blockers resolved within #35685 I did some statistical analysis in #35685#note-37 and found four out of 100 jobs failing in shutdown , could be more, some still running.

https://openqa.opensuse.org/tests/700813#step/shutdown/18 failed after the previous module, reboot, failed so let us discard that one for now, as well as https://openqa.opensuse.org/tests/700863#step/shutdown/19 for the same reason. https://openqa.opensuse.org/tests/700814#step/shutdown/4 as well as https://openqa.opensuse.org/tests/700864#step/shutdown/4 fail for what looks like the same error, that is: xterm does not open from the plasma desktop runner. My hypothesis for the root cause is higher system load after the bootup caused by the reboot module. This is why we moved it originally to further back in the schedule. Somehow we need to ensure that the desktop runner is handled more gracefully in the shutdown module after reboot but waiting for the desktop runner longer within the shutdown module itself sounds wrong as this would make the module more dependant on the previous module. Maybe call the code from "desktop_runner" in reboot after the bootup to ensure the desktop runner is responsive in followup modules?

Actions #45

Updated by okurz almost 6 years ago

  • Status changed from Workable to In Progress
  • Assignee set to okurz
Actions #47

Updated by okurz over 5 years ago

  • Status changed from In Progress to Resolved

merged and stable. Missing failures ar handled elsewhere, e.g. gnucash in #38387, chromium in #36304

Actions

Also available in: Atom PDF