Project

General

Profile

Actions

action #47219

closed

[functional][u][easy] test fails in keymap_or_locale - rogue workqueue lockup

Added by szarate about 5 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 23
Start date:
2019-02-26
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP1-Installer-DVD-aarch64-Build159.1-textmode@aarch64 fails in
keymap_or_locale

Reproducible

Looks sporadic

Expected result

Last good: 158.4 (or more recent)

Further details

There's a rogue workeque lockup present in the screen and serial0.txt log, but no further information about it

Suggestions

  • - Write a new serial failure, so that these can be detected
  • - Modify the post fail hook to collect at least dmesg and full journal log.

Subtasks 1 (0 open1 closed)

action #48419: [functional][u] Hunt for the rogue workqueueResolvedSLindoMansilla2019-02-26

Actions
Actions #1

Updated by szarate about 5 years ago

Setting this to workable right away, I don't think should be complicated :)

Actions #2

Updated by szarate about 5 years ago

  • Subject changed from [functional][u] test fails in keymap_or_locale - rogue workqueue lockup to [functional][u][easy] test fails in keymap_or_locale - rogue workqueue lockup
Actions #3

Updated by okurz about 5 years ago

  • Target version changed from Milestone 22 to Milestone 23

yes, good idea.

Actions #4

Updated by jorauch about 5 years ago

  • Assignee set to jorauch
Actions #5

Updated by jorauch about 5 years ago

  • Status changed from Workable to In Progress

According to the logs no post_fail_hook is being executed.
When setting the error type to 'hard' the test fails and we get the PFH of opensusebasetest.

However the big question is: How am I supposed to verify this?

Actions #6

Updated by jorauch about 5 years ago

  • Status changed from In Progress to Feedback
Actions #7

Updated by szarate about 5 years ago

@jorauch for the time being I guess it's not required, as we know that this kind of errors are difficult to reproduce. Having the PFH sounds nice, however... in the case of this test it didn't cause any other test module to be tainted, could you use soft instead, so that those minutes are not wasted and further testing gets lost?

I thought for a moment that whenever a test that inherited from opensusebasetest, it would also inherit the PFH...

Actions #8

Updated by jorauch about 5 years ago

If we fail hard but the test is not set to 'fatal' we should not lose any tests that follow?
Autoinst logs do not show any sign of PFH, but the screenshots seem to.
Now I am really confused

EDIT: also I think we should not softfail without bugref and if it appears in other modules the impact might be 'hard'

Actions #9

Updated by okurz about 5 years ago

szarate wrote:

I thought for a moment that whenever a test that inherited from opensusebasetest, it would also inherit the PFH...

yes, that should be the case.

Actions #10

Updated by jorauch about 5 years ago

  • Status changed from Feedback to Resolved

PR merged, as its just a pattern detection of a random failure I would close this as we cannot be sure when we'll see it in production.

Actions #11

Updated by okurz about 5 years ago

  • Status changed from Resolved to Feedback

Please see https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/6851#issuecomment-467338617 . Either we should avoid the word "bug" or add a bug reference on our bugzilla instance.

Actions #12

Updated by jorauch about 5 years ago

It's rogue, opening a bug will most likely have no effect, except we have a number to reference to.
Since the used module is named 'known_bugs' I would like to keep the naming and changing the commit message seems not feasible to me.

Actions #13

Updated by szarate about 5 years ago

  • Status changed from Feedback to Resolved

We can already detect the bug :), this ticket was for detecting the bug. I opened #48419 as a follow up, closing this one.

Actions #14

Updated by okurz about 5 years ago

thanks

Actions #15

Updated by okurz about 5 years ago

  • Has duplicate action #48689: [functional][y] test fails in yast2_snapper - "rogue workqueue lockup bsc#1126782 - Serial error: BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!" added
Actions #16

Updated by okurz about 5 years ago

  • Has duplicate deleted (action #48689: [functional][y] test fails in yast2_snapper - "rogue workqueue lockup bsc#1126782 - Serial error: BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 32s!")
Actions

Also available in: Atom PDF