action #30805
closed[functional][opensuse][leap][medium][u] first test after reboot fails in krunner, potential system overload (was: test fails in inkscape - typing too fast?)
0%
Description
Observation¶
openQA test in scenario opensuse-15.0-DVD-x86_64-update_Leap_42.3_kde@64bit fails in
inkscape
Typing is too fast and the xterm text typed to search is mixed with the bash command in the xterm, enter key probably does not arrive in the right moment. Fix the test reducing typing speed or searching for needle between two steps for example would be needed to avoid this error.
Tasks¶
- Try to reproduce issue
- Find the possible flow which leads to behavior in failed job and improve code (e.g. by using match_typed for ensure_installed on KDE)
Reproducible¶
Fails since (at least) Build 106.1 (current job)
Expected result¶
Last good: 105.1 (or more recent)
Further details¶
Always latest result in this scenario: latest
Updated by okurz almost 7 years ago
- Subject changed from test fails in inkscape to [functional][opensuse][leap]test fails in inkscape - typing too fast?
- Due date set to 2018-03-13
- Target version set to Milestone 14
Updated by riafarov almost 7 years ago
- Description updated (diff)
- Status changed from New to Workable
Updated by mgriessmeier almost 7 years ago
- Subject changed from [functional][opensuse][leap]test fails in inkscape - typing too fast? to [functional][opensuse][leap][medium][research] test fails in inkscape - typing too fast?
Updated by jorauch almost 7 years ago
- Assignee set to jorauch
Has not been seen since then
Current build
https://openqa.opensuse.org/tests/621710
Triggered 50 runs on pinky to generate statistics -> http://pinky.arch.suse.de/tests
Updated by jorauch almost 7 years ago
- Status changed from Workable to In Progress
- Assignee set to jorauch
Appeared a few times, would suggest reducing the typing limit in ensure_installed / x11_start_program (not sure which one applies here)
Updated by mgriessmeier almost 7 years ago
- Due date changed from 2018-03-13 to 2018-03-27
- Target version changed from Milestone 14 to Milestone 15
Updated by jorauch almost 7 years ago
Add inkscape test to suite that boots from image (extratests in kde) and trigger 50 times to reproduce
Most likely in x11_start_program
Updated by okurz almost 7 years ago
- Related to action #33283: [opensuse][functional][u][sporadic][medium] test fails in kontact - typing string loosing characters added
Updated by jorauch almost 7 years ago
Might be related to https://progress.opensuse.org/issues/31687 ?
Updated by mgriessmeier almost 7 years ago
- Related to action #31687: [opensuse][functional][medium][u] x11_start_program does not care if program can not be called in desktop runner even after three 'ret' presses with "valid" and "target_match" added
Updated by jorauch almost 7 years ago
- Status changed from In Progress to Blocked
Setting to blocked by:
https://progress.opensuse.org/issues/31687
Reason:
We have no real check if the entered text is actually what we wanted to enter and even if we knew it x11_start_program lacks a proper error handling.
Will try to find a proper solution in blocker ticket, then this should become obsolete
Updated by jorauch almost 7 years ago
Created PR with code:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4715
Currently needle editor does not work for me, so needles and verification run need to be done
Updated by mgriessmeier almost 7 years ago
- Due date changed from 2018-03-27 to 2018-04-10
Updated by okurz almost 7 years ago
jorauch wrote:
Currently needle editor does not work for me, so needles and verification run need to be done
mkittler already provided patches. You could try these.
Updated by mgriessmeier almost 7 years ago
- Due date changed from 2018-04-10 to 2018-04-24
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-04-24 to 2018-05-08
- Status changed from Blocked to Workable
- Target version changed from Milestone 15 to Milestone 16
blocker has been resolved, moving to next sprint as workable
Updated by okurz over 6 years ago
- Subject changed from [functional][opensuse][leap][medium][research] test fails in inkscape - typing too fast? to [functional][opensuse][leap][medium][u] test fails in inkscape - typing too fast?
The expected result is clearly that the test runs stable so removing the [research] tag even though it might involve some research, just research is not enough.
Updated by jorauch over 6 years ago
- Status changed from Workable to In Progress
I see two options here:
- we merge the WIP PR and create a needle for the typed text
- we are happy that it did not happen for a long time and close this without merging the PR
Updated by jorauch over 6 years ago
- Status changed from In Progress to Feedback
- Assignee changed from jorauch to okurz
What do you think?
Updated by okurz over 6 years ago
- Status changed from Feedback to In Progress
- Assignee changed from okurz to jorauch
jorauch wrote:
I see two options here:
- we merge the WIP PR and create a needle for the typed text
We will merge the PR only after it is not WIP anymore, obviously. The question there still holds: What makes inkscape special?
- we are happy that it did not happen for a long time and close this without merging the PR
That is not true because I could easily find a recent failure: https://openqa.opensuse.org/tests/664611#step/inkscape/4
Updated by jorauch over 6 years ago
- Status changed from In Progress to Feedback
- Assignee changed from jorauch to okurz
jorauch wrote:
I see two options here:
we merge the WIP PR and create a needle for the typed text
We will merge the PR only after it is not WIP anymore, obviously. The question there still holds: What makes inkscape special?
Appearantly it's special because its failing regularly and we have a ticket for it.
When comparing the recent fail you posted with the inital issue we can see that it fails in different steps, but both are caused by x11_start_program
Maybe we should go a step back and try to harden x11_start_program before we fix symptoms all around our tests?
Updated by jorauch over 6 years ago
- Related to action #35877: [functional][u] Find out in post-fail-hook if system is I/O-busy added
Updated by jorauch over 6 years ago
- Status changed from Feedback to In Progress
- Assignee changed from okurz to jorauch
Talked with okurz.
Our assumption: The upgrade systems have a "dirty" filesystem causing e.g. btrfs maintenance tasks to slow down the systems more than clean installation jobs. Then random test modules fail, e.g. inkscape.
To separate the concerns we have the following ideas:
- Add new test suite "dirty system test" with normal timeout and explicit check for system responsiveness, e.g. starting krunner very often and check for the suggestions to popup (best directly after reboot)
- update tests with TIMEOUT_SCALE=3 to exclude workload problems
Updated by lnussel over 6 years ago
the force_cron_run test is meant to take care of triggering all cron jobs so they don't disturb later.
Updated by okurz over 6 years ago
- Status changed from In Progress to Feedback
- Assignee changed from jorauch to okurz
lnussel wrote:
the force_cron_run test is meant to take care of triggering all cron jobs so they don't disturb later.
That's true, but still we see the update cases are more prone to fail for reason yet unknown. Could also be that the krunner itself needs more time to rebuild some cache or so. This is why I think #35877 could help. Don't you think?
Updated by jorauch over 6 years ago
- Status changed from Feedback to In Progress
- Assignee changed from okurz to jorauch
Additionally the force_cron_run is useless if we reboot in the meantime
Updated by mgriessmeier over 6 years ago
- Subject changed from [functional][opensuse][leap][medium][u] test fails in inkscape - typing too fast? to [functional][opensuse][leap][medium][u] first test after reboot fails in krunner, potential system overload (was: test fails in inkscape - typing too fast?)
- Due date changed from 2018-05-08 to 2018-05-22
Updated by jorauch over 6 years ago
We could either:
- run force_cron_run after every reboot
- change the complete order so there is no reboot in between
- add a force_cron_run or wait to the reboot module
I'd prefer to just change the order since the reboot is not necessary for any of the following tests, but we would lose a snapshot.
Updated by okurz over 6 years ago
- Related to action #33571: [opensuse][functional][u][medium] test fails in shutdown - emoticon settings are opened added
Updated by jorauch over 6 years ago
As discussed with okurz we should move the reboot before the shutdown
This is located in main_common.pm
Updated by jorauch over 6 years ago
- Status changed from In Progress to Feedback
PR created:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5000
Waiting for merge and consequences in production
Updated by okurz over 6 years ago
validation test was fine, PR merged. Triggered 200 test jobs with
for i in {1..100} ; do openqa_clone_job_o3 --skip-chained-deps 671977 TEST=okurz_poo30805_$i BUILD=241.1:poo30805 _GROUP="Development Leap" ; done
for i in {1..100} ; do openqa_clone_job_o3 --skip-chained-deps 672079 TEST=okurz_poo30805_$i BUILD=241.1:poo30805 _GROUP="Development Leap" ; done
Please check statistics.
Updated by jorauch over 6 years ago
There are a lot of obsolted, not sure what this status means?
In the overview there are no inkscape fails at least.
Can we close this or are the obsoleted a problem?
Updated by okurz over 6 years ago
- Related to action #35688: [opensuse][functional][u][sporadic][bsc#1091353][medium] Various unstable tests on o3 - inkscape added
Updated by okurz over 6 years ago
I would not close this bug yet. I am working on #36117 and my idea is to create an explicit test module calling the desktop runner, see https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5089
Afterwards we should again run some more tests to check statistics.
btw, "obsoleted" means that a new build was triggered and that canceled further execution of jobs in older builds. For our purposes you can ignore these and we can take a look on passed vs. failed. However, many tests failed in modules like yast2_lan which might be helpful for the future to make tests more stable but let's focus on the x11 test modules for now.
So I suggest to wait for #36117 first and then come back to this one here.
Updated by mgriessmeier over 6 years ago
- Blocked by action #36117: [functional][u][sporadic] test fails in xterm to show "xterm" (needle tag desktop-runner-plasma-suggestions) in krunner - system slower just after login? added
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-05-22 to 2018-06-05
- Status changed from Feedback to Blocked
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-06-05 to 2018-06-19
- Status changed from Blocked to Workable
- Target version changed from Milestone 16 to Milestone 17
blocker resolved, moving to workable into next sprint (to be revisited in planning meeting)
Updated by okurz over 6 years ago
- Target version changed from Milestone 17 to Milestone 17
Updated by okurz over 6 years ago
- Due date changed from 2018-07-03 to 2018-08-14
- Status changed from Workable to Blocked
- Assignee changed from jorauch to okurz
- Target version changed from Milestone 17 to Milestone 18
no reaction. Feel free to unassign sooner, no problem to give tickets back to the backlog. As well as in other tickets, blocked by #31351
Updated by okurz over 6 years ago
- Status changed from Blocked to Workable
- Assignee deleted (
okurz)
With blockers resolved within #35685 I did some statistical analysis in #35685#note-37 and found four out of 100 jobs failing in shutdown , could be more, some still running.
https://openqa.opensuse.org/tests/700813#step/shutdown/18 failed after the previous module, reboot, failed so let us discard that one for now, as well as https://openqa.opensuse.org/tests/700863#step/shutdown/19 for the same reason. https://openqa.opensuse.org/tests/700814#step/shutdown/4 as well as https://openqa.opensuse.org/tests/700864#step/shutdown/4 fail for what looks like the same error, that is: xterm does not open from the plasma desktop runner. My hypothesis for the root cause is higher system load after the bootup caused by the reboot module. This is why we moved it originally to further back in the schedule. Somehow we need to ensure that the desktop runner is handled more gracefully in the shutdown module after reboot but waiting for the desktop runner longer within the shutdown module itself sounds wrong as this would make the module more dependant on the previous module. Maybe call the code from "desktop_runner" in reboot after the bootup to ensure the desktop runner is responsive in followup modules?
Updated by okurz over 6 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz