action #63355
closed[opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?
0%
Description
Observation¶
openQA test in scenario opensuse-15.1-Argon-Live-x86_64-krypton-live@64bit-2G fails in
kontact
kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?
Suggestions¶
- Investigate why it fails. Really OOM? Is the failed latest (opensuse-15.1-Argon-Live-x86_64-Build4.27-krypton-live@64bit-2G) the same reported in the ticket?
- Try with 2GB
- Increase the timeout
Reproducible¶
Fails since (at least) Build 2.212
Expected result¶
Last good: 2.211 (or more recent)
Suggestions¶
When we can't login in the post_fail_hook we do not even know if we are OOM. We could follow two different points:
- Prolong console activation time for post_fail_hooks
- Check with higher amount of RAM
Further details¶
Always latest result in this scenario: latest
Updated by okurz almost 5 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1185997
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by okurz over 4 years ago
- Subject changed from [opensuse] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM? to [opensuse][functional][u] test fails in kontact, kontact summary screen only partitially shown, then post_fail_hook fails to login – OOM?
Updated by SLindoMansilla over 4 years ago
- Description updated (diff)
- Status changed from New to Workable
- Target version set to Milestone 30
- Estimated time set to 42.00 h
Updated by okurz over 4 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1208072
To prevent further reminder comments one of the following options should be followed:
- The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
- The openQA job group is moved to "Released"
- The label in the openQA scenario is removed
Updated by zluo over 4 years ago
- Status changed from Workable to In Progress
- Assignee set to zluo
let me check this at first...
Updated by zluo over 4 years ago
found out that this issue is related to performance, match_timeout is 90 which is too short for krypton-live. it needs about 180 seconds to get kontact open fully for needle match:
http://f40.suse.de/tests/7243#step/kontact/13
will check this on o3.
Updated by zluo over 4 years ago
post_fail_hook fails to login - is not OOM related, as I can see that is related to needle match:
Updated by zluo over 4 years ago
https://openqa.opensuse.org/tests/1216265#step/kontact/13 works now,
but http://f40.suse.de/tests/7253#step/kontact/34 shows another issue with matching generic_desktop.
Maybe it needs to handle this as well.
Updated by okurz over 4 years ago
please keep in mind to still address the original issue even though it might not be easy to reproduce. It might be someone changed other settings, e.g. increase the RAM, so that you are not hitting OOM easily or at all. As in the original suggestion I would focus on being able to do something useful in the post_fail_hook even in an OOM condition
Updated by zluo over 4 years ago
Updated by zluo over 4 years ago
https://openqa.opensuse.org/tests/1217011#step/kontact/20 post_fail_hook works fine, I cannot reproduce atm.
Please don't assume that someone changed settings. SUT won't work at all if it hits OOM.
Please let me know how to do something useful in post_fail_hook if OOM, I really don't know about this.
Updated by okurz over 4 years ago
zluo wrote:
https://openqa.opensuse.org/tests/1217011#step/kontact/20 post_fail_hook works fine, I cannot reproduce atm.
This is an interesting example because the post_fail_hook managed to login but besides that could hardly do anything useful as no logs are uploaded at all.
Please don't assume that someone changed settings.
Not sure what you mean by that.
SUT won't work at all if it hits OOM.
Sure it does, just not very fast :)
Please let me know how to do something useful in post_fail_hook if OOM, I really don't know about this.
We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.
@SLindoMansilla I don't understand your change in #63355#note-3 . You are adding a second section "Suggestions" to the ticket and ask if it "really is OOM?" That is the point of the ticket to have code that can answer the question for every failed test, not just for the single reported issue. Also you suggest to try with 2G when the scenario is already for 2G. Did you mean to try out other values instead?
Updated by zluo over 4 years ago
We already use magic-sysrq-w to find out if there are blocked tasks so we could trigger another sysrq commands, e.g. magic-sysrq-m to read memory information and parse from there if there is free memory. Also might be helpul in one of the test setup modules to already log into the log console so that we can switch to the already logged in console in case of problems and not get stuck at the login prompt. If the system is responsive during post_fail_hook and not workarounds need to be tried we could also read out from logs if there was an OOM condition. Also in the end we should be able to clearly determine from the information that we can gather from the SUT automatically what is wrong with kontact when it is only partially shown.
--
Thanks for this information.
Well, since this is clearly only 1 failure and it is not an OOM issue which causes post_fail_hook failed with needle matching at login prompt, I would suggest to split this task in another ticket.
Updated by zluo over 4 years ago
- Related to action #65040: [sle][functiona][u] enhance post_fail_hook on OOM condition added
Updated by zluo over 4 years ago
- Status changed from In Progress to Resolved
Updated by okurz over 4 years ago
- Related to action #65489: [functional][u][sporadic] test fails in kontact, stuck in loop on "desktop-runner-plasma-suggestions" added