Project

General

Profile

action #54488

[opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Enable post_fail_hook for collecting logs

Added by okurz over 2 years ago. Updated about 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Enhancement to existing tests
Target version:
SUSE QA - Milestone 31
Start date:
2019-07-20
Due date:
% Done:

0%

Estimated time:
42.00 h
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.1-Argon-Live-x86_64-krypton-live@64bit-2G fails in
kontact
then fails to login in post_fail_hook. Probably system is stalled.

Reproducible

Fails since (at least) Build 1.27

Expected result

Regardless of being able to login or not we should run our "stall detection" and system load checks, e.g. at least magic-sysrq to look for blocked tasks.

Suggestions

  • Move the return if get_var(NOLOGS) from x11test::post_fail_hook to opensusebasetest::post_fail_hook but still under show_tasks_in_blocked_state. DONE
  • Move the call to export_logs to the post_fail_hook inside opensusebasetest DONE
  • Remove x11test::post_fail_hook DONE

Further details

Always latest result in this scenario: latest


Related issues

Related to openQA Tests - action #36126: [functional][u] post_fail_hook matches on "text_login_root" before actual tty switch and therefore never logs inResolved2018-05-14

History

#1 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/997651

#2 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1010661

#3 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1022219

#4 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1034012

#5 Updated by okurz about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1045410

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed

#6 Updated by okurz about 2 years ago

  • Related to action #36126: [functional][u] post_fail_hook matches on "text_login_root" before actual tty switch and therefore never logs in added

#7 Updated by SLindoMansilla about 2 years ago

  • Priority changed from Normal to High

#8 Updated by SLindoMansilla almost 2 years ago

  • Subject changed from [functional][u] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless to [opensuse][kde] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless

#9 Updated by zluo almost 2 years ago

  • Subject changed from [opensuse][kde] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless to [opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless
  • Status changed from New to In Progress
  • Assignee set to zluo

actually stall detection is already in place. I saw this yesterday on o3. let' me check this now.

#10 Updated by zluo almost 2 years ago

http://f40.suse.de/tests/5532#next_previous shows 100 test runs has only 1 failure:

http://f40.suse.de/tests/5517#step/kontact/12 kontact can not be started up. It is mostly a worker issue or performance issue.

So this issue can be found https://openqa.opensuse.org/tests/1088376#step/kontact/12. At moment we don't have issue on o3. So I would say we have now different situation.
Since post_fail_hook doesn't called at all, I will this it for now and check.

#11 Updated by zluo almost 2 years ago

  • Subject changed from [opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless to [opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Enable post_fail_hook for collecting logs

#13 Updated by SLindoMansilla almost 2 years ago

So this issue can be found https://openqa.opensuse.org/tests/1088376#step/kontact/12. At moment we don't have issue on o3. So I would say we have now different situation.
Since post_fail_hook doesn't called at all, I will this it for now and check.

This links shows that post_fail_hook was called. But, it failed to login. That is exactly the issue that the ticket mentions.

#14 Updated by zluo almost 2 years ago

to discuss with team:

how can we handle the issue when post_fail_hook encounter issue with stalled SUT?

#15 Updated by szarate almost 2 years ago

  • Description updated (diff)

I think we can get away with just shifting around the calls to post fail hooks

#16 Updated by szarate almost 2 years ago

  • Description updated (diff)

#17 Updated by szarate almost 2 years ago

  • Blocks action #60188: [functional][u] test fails in libqt5_qtbase because "Emoticons --System Settings Module" window added

#18 Updated by szarate almost 2 years ago

  • Assignee deleted (zluo)

#19 Updated by szarate almost 2 years ago

  • Status changed from In Progress to Workable

#20 Updated by szarate almost 2 years ago

  • Target version set to Milestone 28
  • Estimated time set to 42.00 h

#21 Updated by mgriessmeier almost 2 years ago

  • Target version changed from Milestone 28 to Milestone 31

#22 Updated by SLindoMansilla almost 2 years ago

  • Description updated (diff)
  • Assignee set to SLindoMansilla

#23 Updated by SLindoMansilla almost 2 years ago

  • Status changed from Workable to In Progress

Merge x11test::post_fail_hook to opensusebasetest: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9324 (merged)

#24 Updated by SLindoMansilla almost 2 years ago

  • Status changed from In Progress to Workable

Waiting for next occurrence in production.

#25 Updated by SLindoMansilla over 1 year ago

  • Status changed from Workable to Resolved

post_fail_hook is triggered: https://openqa.opensuse.org/tests/1176518#step/kontact/21

But, the system is stalled: #63355

#26 Updated by okurz over 1 year ago

  • Status changed from Resolved to Workable

SLindoMansilla wrote:

post_fail_hook is triggered: https://openqa.opensuse.org/tests/1176518#step/kontact/21

But, the system is stalled: #63355

But this is exactly what the original ticket observation states: #54488#Observation

So maybe you fixed some intermediate problem and are back to the original problem now? Maybe it helps to overall increase the timeout a lot for the initial login of the post_fail_hook or login into the log console before any relevant test has a chance to fail.

#27 Updated by okurz over 1 year ago

  • Blocks deleted (action #60188: [functional][u] test fails in libqt5_qtbase because "Emoticons --System Settings Module" window)

#28 Updated by SLindoMansilla over 1 year ago

  • Status changed from Workable to New
  • Assignee deleted (SLindoMansilla)

For grooming

#29 Updated by SLindoMansilla over 1 year ago

  • Description updated (diff)

#30 Updated by szarate over 1 year ago

  • Status changed from New to Resolved

Latest occurences of errors in kontact are no longer related to stalls, so this ticket seems done from AC criteria, however https://progress.opensuse.org/issues/68794 has been created as a follow up to address the time wasted during the post fail hook stuff.

okurz: if you disagree, please ask via rocket chat before reopening, or remove the [u] tag and pick it yourself

#31 Updated by SLindoMansilla about 1 year ago

  • Assignee set to szarate

Also available in: Atom PDF