Project

General

Profile

Actions

action #54488

closed

[opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Enable post_fail_hook for collecting logs

Added by okurz almost 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Enhancement to existing tests
Target version:
SUSE QA - Milestone 31
Start date:
2019-07-20
Due date:
% Done:

0%

Estimated time:
42.00 h
Difficulty:

Description

Observation

openQA test in scenario opensuse-15.1-Argon-Live-x86_64-krypton-live@64bit-2G fails in
kontact
then fails to login in post_fail_hook. Probably system is stalled.

Reproducible

Fails since (at least) Build 1.27

Expected result

Regardless of being able to login or not we should run our "stall detection" and system load checks, e.g. at least magic-sysrq to look for blocked tasks.

Suggestions

  • Move the return if get_var(NOLOGS) from x11test::post_fail_hook to opensusebasetest::post_fail_hook but still under show_tasks_in_blocked_state. DONE
  • Move the call to export_logs to the post_fail_hook inside opensusebasetest DONE
  • Remove x11test::post_fail_hook DONE

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #36126: [functional][u] post_fail_hook matches on "text_login_root" before actual tty switch and therefore never logs inResolvedzluo2018-05-14

Actions
Actions #1

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/997651

Actions #2

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1010661

Actions #3

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1022219

Actions #4

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1034012

Actions #5

Updated by okurz over 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: krypton-live
https://openqa.opensuse.org/tests/1045410

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #6

Updated by okurz over 4 years ago

  • Related to action #36126: [functional][u] post_fail_hook matches on "text_login_root" before actual tty switch and therefore never logs in added
Actions #7

Updated by SLindoMansilla over 4 years ago

  • Priority changed from Normal to High
Actions #8

Updated by SLindoMansilla over 4 years ago

  • Subject changed from [functional][u] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless to [opensuse][kde] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless
Actions #9

Updated by zluo over 4 years ago

  • Subject changed from [opensuse][kde] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless to [opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless
  • Status changed from New to In Progress
  • Assignee set to zluo

actually stall detection is already in place. I saw this yesterday on o3. let' me check this now.

Actions #10

Updated by zluo over 4 years ago

http://f40.suse.de/tests/5532#next_previous shows 100 test runs has only 1 failure:

http://f40.suse.de/tests/5517#step/kontact/12 kontact can not be started up. It is mostly a worker issue or performance issue.

So this issue can be found https://openqa.opensuse.org/tests/1088376#step/kontact/12. At moment we don't have issue on o3. So I would say we have now different situation.
Since post_fail_hook doesn't called at all, I will this it for now and check.

Actions #11

Updated by zluo over 4 years ago

  • Subject changed from [opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Probably system is stalled but we should run our "stall detection" and system load checks regardless to [opensuse][kde][functional][u] test fails in kontact, then fails to login in post_fail_hook. Enable post_fail_hook for collecting logs
Actions #13

Updated by SLindoMansilla over 4 years ago

So this issue can be found https://openqa.opensuse.org/tests/1088376#step/kontact/12. At moment we don't have issue on o3. So I would say we have now different situation.
Since post_fail_hook doesn't called at all, I will this it for now and check.

This links shows that post_fail_hook was called. But, it failed to login. That is exactly the issue that the ticket mentions.

Actions #14

Updated by zluo over 4 years ago

to discuss with team:

how can we handle the issue when post_fail_hook encounter issue with stalled SUT?

Actions #15

Updated by szarate over 4 years ago

  • Description updated (diff)

I think we can get away with just shifting around the calls to post fail hooks

Actions #16

Updated by szarate over 4 years ago

  • Description updated (diff)
Actions #17

Updated by szarate over 4 years ago

  • Blocks action #60188: [functional][u] test fails in libqt5_qtbase because "Emoticons --System Settings Module" window added
Actions #18

Updated by szarate over 4 years ago

  • Assignee deleted (zluo)
Actions #19

Updated by szarate over 4 years ago

  • Status changed from In Progress to Workable
Actions #20

Updated by szarate over 4 years ago

  • Target version set to Milestone 28
  • Estimated time set to 42.00 h
Actions #21

Updated by mgriessmeier over 4 years ago

  • Target version changed from Milestone 28 to Milestone 31
Actions #22

Updated by SLindoMansilla over 4 years ago

  • Description updated (diff)
  • Assignee set to SLindoMansilla
Actions #23

Updated by SLindoMansilla over 4 years ago

  • Status changed from Workable to In Progress

Merge x11test::post_fail_hook to opensusebasetest: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9324 (merged)

Actions #24

Updated by SLindoMansilla over 4 years ago

  • Status changed from In Progress to Workable

Waiting for next occurrence in production.

Actions #25

Updated by SLindoMansilla about 4 years ago

  • Status changed from Workable to Resolved

post_fail_hook is triggered: https://openqa.opensuse.org/tests/1176518#step/kontact/21

But, the system is stalled: #63355

Actions #26

Updated by okurz about 4 years ago

  • Status changed from Resolved to Workable

SLindoMansilla wrote:

post_fail_hook is triggered: https://openqa.opensuse.org/tests/1176518#step/kontact/21

But, the system is stalled: #63355

But this is exactly what the original ticket observation states: #54488#Observation

So maybe you fixed some intermediate problem and are back to the original problem now? Maybe it helps to overall increase the timeout a lot for the initial login of the post_fail_hook or login into the log console before any relevant test has a chance to fail.

Actions #27

Updated by okurz about 4 years ago

  • Blocks deleted (action #60188: [functional][u] test fails in libqt5_qtbase because "Emoticons --System Settings Module" window)
Actions #28

Updated by SLindoMansilla almost 4 years ago

  • Status changed from Workable to New
  • Assignee deleted (SLindoMansilla)

For grooming

Actions #29

Updated by SLindoMansilla almost 4 years ago

  • Description updated (diff)
Actions #30

Updated by szarate almost 4 years ago

  • Status changed from New to Resolved

Latest occurences of errors in kontact are no longer related to stalls, so this ticket seems done from AC criteria, however https://progress.opensuse.org/issues/68794 has been created as a follow up to address the time wasted during the post fail hook stuff.

@okurz: if you disagree, please ask via rocket chat before reopening, or remove the [u] tag and pick it yourself

Actions #31

Updated by SLindoMansilla over 3 years ago

  • Assignee set to szarate
Actions

Also available in: Atom PDF