action #63355: [opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM? - openQA Tests (public) - openSUSE Project Management Tool

Actions

Copy link

action #63355

closed

[opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?

Added by okurz over 5 years ago. Updated about 5 years ago.

Status:

Resolved

Priority:

High

Assignee:

zluo

Category:

Bugs in existing tests

Target version:

SUSE QA (private) - Milestone 30

Start date:

2020-02-10

Due date:

% Done:

Estimated time:

42.00 h

Difficulty:

Description

Observation¶

openQA test in scenario opensuse-15.1-Argon-Live-x86_64-krypton-live@64bit-2G fails in
kontact
kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?

Suggestions¶

Investigate why it fails. Really OOM? Is the failed latest (opensuse-15.1-Argon-Live-x86_64-Build4.27-krypton-live@64bit-2G) the same reported in the ticket?
Try with 2GB
Increase the timeout

Reproducible¶

Fails since (at least) Build 2.212

Expected result¶

Last good: 2.211 (or more recent)

Suggestions¶

When we can't login in the post_fail_hook we do not even know if we are OOM. We could follow two different points:

Prolong console activation time for post_fail_hooks
Check with higher amount of RAM

Further details¶

Always latest result in this scenario: latest

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by okurz over 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1185997

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released"
The label in the openQA scenario is removed

Actions

Copy link

Updated by okurz over 5 years ago

Subject changed from [opensuse] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM? to [opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?

Actions

Copy link

Updated by SLindoMansilla about 5 years ago

Description updated (diff)
Status changed from New to Workable
Target version set to Milestone 30
Estimated time set to 42.00 h

Actions

Copy link

Updated by okurz about 5 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1208072

To prevent further reminder comments one of the following options should be followed:

The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
The openQA job group is moved to "Released"
The label in the openQA scenario is removed

Actions

Copy link

Updated by zluo about 5 years ago

Status changed from Workable to In Progress
Assignee set to zluo

let me check this at first...

Actions

Copy link

Updated by zluo about 5 years ago

found out that this issue is related to performance, match_timeout is 90 which is too short for krypton-live. it needs about 180 seconds to get kontact open fully for needle match:

http://f40.suse.de/tests/7243#step/kontact/13

will check this on o3.

Actions

Copy link

Updated by zluo about 5 years ago

post_fail_hook fails to login - is not OOM related, as I can see that is related to needle match:

https://openqa.opensuse.org/tests/1214448#step/kontact/20

Actions

Copy link

Updated by zluo about 5 years ago

https://openqa.opensuse.org/tests/1216265#step/kontact/13 works now,

but http://f40.suse.de/tests/7253#step/kontact/34 shows another issue with matching generic_desktop.

Maybe it needs to handle this as well.

Actions

Copy link

Updated by okurz about 5 years ago

please keep in mind to still address the original issue even though it might not be easy to reproduce. It might be someone changed other settings, e.g. increase the RAM, so that you are not hitting OOM easily or at all. As in the original suggestion I would focus on being able to do something useful in the post_fail_hook even in an OOM condition

Actions

Copy link

#10

Updated by zluo about 5 years ago

https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9909 updated.

Actions

Copy link

#11

Updated by zluo about 5 years ago

@okurz

https://openqa.opensuse.org/tests/1217011#step/kontact/20 post_fail_hook works fine, I cannot reproduce atm.

Please don't assume that someone changed settings. SUT won't work at all if it hits OOM.
Please let me know how to do something useful in post_fail_hook if OOM, I really don't know about this.

Actions

Copy link

#12

Updated by okurz about 5 years ago

zluo wrote:

https://openqa.opensuse.org/tests/1217011#step/kontact/20 post_fail_hook works fine, I cannot reproduce atm.

This is an interesting example because the post_fail_hook managed to login but besides that could hardly do anything useful as no logs are uploaded at all.

Please don't assume that someone changed settings.

Not sure what you mean by that.

SUT won't work at all if it hits OOM.

Sure it does, just not very fast :)

Please let me know how to do something useful in post_fail_hook if OOM, I really don't know about this.

We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.

@SLindoMansilla I don't understand your change in #63355#note-3 . You are adding a second section "Suggestions" to the ticket and ask if it "really is OOM?" That is the point of the ticket to have code that can answer the question for every failed test, not just for the single reported issue. Also you suggest to try with 2G when the scenario is already for 2G. Did you mean to try out other values instead?

Actions

Copy link

#13

Updated by zluo about 5 years ago

We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.

Thanks for this information.
Well, since this is clearly only 1 failure and it is not an OOM issue which causes post_fail_hook failed with needle matching at login prompt, I would suggest to split this task in another ticket.

Actions

Copy link

#14

Updated by zluo about 5 years ago

Related to action #65040: [sle][functiona][u] enhance post_fail_hook on OOM condition added

Actions

Copy link

#15

Updated by zluo about 5 years ago

Status changed from In Progress to Resolved

resolved now:

https://openqa.opensuse.org/tests/1226211#step/kontact/26

Actions

Copy link

#16

Updated by okurz about 5 years ago

Related to action #65489: [functional][u][sporadic] test fails in kontact, stuck in loop on "desktop-runner-plasma-suggestions" added

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public) » openQA Project (public) » openQA Tests (public)

Tags

Custom queries

action #63355

[opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?

Observation¶

Suggestions¶

Reproducible¶

Expected result¶

Suggestions¶

Further details¶

Updated by okurz over 5 years ago

Updated by okurz over 5 years ago

Updated by SLindoMansilla about 5 years ago

Updated by okurz about 5 years ago

Updated by zluo about 5 years ago

Updated by zluo about 5 years ago

Updated by zluo about 5 years ago

Updated by zluo about 5 years ago

Updated by okurz about 5 years ago

Updated by zluo about 5 years ago

Updated by zluo about 5 years ago

Updated by okurz about 5 years ago

Updated by zluo about 5 years ago

Updated by zluo about 5 years ago

Updated by zluo about 5 years ago

Updated by okurz about 5 years ago