Project

General

Profile

Actions

action #63355

closed

[opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 30
Start date:
2020-02-10
Due date:
% Done:

0%

Estimated time:
42.00 h
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.1-Argon-Live-x86_64-krypton-live@64bit-2G fails in
kontact
kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?

Suggestions

  • Investigate why it fails. Really OOM? Is the failed latest (opensuse-15.1-Argon-Live-x86_64-Build4.27-krypton-live@64bit-2G) the same reported in the ticket?
  • Try with 2GB
  • Increase the timeout

Reproducible

Fails since (at least) Build 2.212

Expected result

Last good: 2.211 (or more recent)

Suggestions

When we can't login in the post_fail_hook we do not even know if we are OOM. We could follow two different points:

  • Prolong console activation time for post_fail_hooks
  • Check with higher amount of RAM

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #65040: [sle][functiona][u] enhance post_fail_hook on OOM conditionResolveddheidler2020-03-31

Actions
Related to openQA Tests - action #65489: [functional][u][sporadic] test fails in kontact, stuck in loop on "desktop-runner-plasma-suggestions"Resolvedjorauch2020-04-09

Actions
Actions #1

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1185997

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #2

Updated by okurz over 4 years ago

  • Subject changed from [opensuse] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM? to [opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?
Actions #3

Updated by SLindoMansilla over 4 years ago

  • Description updated (diff)
  • Status changed from New to Workable
  • Target version set to Milestone 30
  • Estimated time set to 42.00 h
Actions #4

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1208072

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #5

Updated by zluo over 4 years ago

  • Status changed from Workable to In Progress
  • Assignee set to zluo

let me check this at first...

Actions #6

Updated by zluo over 4 years ago

found out that this issue is related to performance, match_timeout is 90 which is too short for krypton-live. it needs about 180 seconds to get kontact open fully for needle match:

http://f40.suse.de/tests/7243#step/kontact/13

will check this on o3.

Actions #7

Updated by zluo over 4 years ago

post_fail_hook fails to login - is not OOM related, as I can see that is related to needle match:

https://openqa.opensuse.org/tests/1214448#step/kontact/20

Actions #8

Updated by zluo over 4 years ago

https://openqa.opensuse.org/tests/1216265#step/kontact/13 works now,

but http://f40.suse.de/tests/7253#step/kontact/34 shows another issue with matching generic_desktop.

Maybe it needs to handle this as well.

Actions #9

Updated by okurz over 4 years ago

please keep in mind to still address the original issue even though it might not be easy to reproduce. It might be someone changed other settings, e.g. increase the RAM, so that you are not hitting OOM easily or at all. As in the original suggestion I would focus on being able to do something useful in the post_fail_hook even in an OOM condition

Actions #11

Updated by zluo over 4 years ago

@okurz

https://openqa.opensuse.org/tests/1217011#step/kontact/20 post_fail_hook works fine, I cannot reproduce atm.

Please don't assume that someone changed settings. SUT won't work at all if it hits OOM.
Please let me know how to do something useful in post_fail_hook if OOM, I really don't know about this.

Actions #12

Updated by okurz over 4 years ago

zluo wrote:

https://openqa.opensuse.org/tests/1217011#step/kontact/20 post_fail_hook works fine, I cannot reproduce atm.

This is an interesting example because the post_fail_hook managed to login but besides that could hardly do anything useful as no logs are uploaded at all.

Please don't assume that someone changed settings.

Not sure what you mean by that.

SUT won't work at all if it hits OOM.

Sure it does, just not very fast :)

Please let me know how to do something useful in post_fail_hook if OOM, I really don't know about this.

We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.

@SLindoMansilla I don't understand your change in #63355#note-3 . You are adding a second section "Suggestions" to the ticket and ask if it "really is OOM?" That is the point of the ticket to have code that can answer the question for every failed test, not just for the single reported issue. Also you suggest to try with 2G when the scenario is already for 2G. Did you mean to try out other values instead?

Actions #13

Updated by zluo over 4 years ago

We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.

--

Thanks for this information.
Well, since this is clearly only 1 failure and it is not an OOM issue which causes post_fail_hook failed with needle matching at login prompt, I would suggest to split this task in another ticket.

Actions #14

Updated by zluo over 4 years ago

  • Related to action #65040: [sle][functiona][u] enhance post_fail_hook on OOM condition added
Actions #15

Updated by zluo over 4 years ago

  • Status changed from In Progress to Resolved
Actions #16

Updated by okurz over 4 years ago

  • Related to action #65489: [functional][u][sporadic] test fails in kontact, stuck in loop on "desktop-runner-plasma-suggestions" added
Actions

Also available in: Atom PDF