Project

General

Profile

Actions

action #30619

closed

[sle][functional][y][medium][bsc#1032831] test fails in yast2_snapper after closing the main GUI window - do we need to bringback the original workaround? or improve logging

Added by okurz over 6 years ago. Updated about 5 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
SUSE QA - Milestone 23
Start date:
2018-01-22
Due date:
2019-03-26
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-Installer-DVD-x86_64-xen@64bit fails in
yast2_snapper
to wait for the closed application within xterm.

Reproducible

Fails since (at least) Build 408.1

Expected result

Last good: 407.1 (or more recent)

Problem

With commit c5253d04 riafarov removed a workaround we had to wait for longer and such. Do we need to bring something back for that? Assigning to riafarov to clarify with his opinion.

Further details

Always latest result in this scenario: latest


Related issues 4 (0 open4 closed)

Related to openQA Tests - action #18346: [sles][functional][yast] yast2_snapper: sporadicly snapshot deletion takes longer - but test is not waiting 240s like in the code anywayRejected2017-04-052018-03-13

Actions
Related to openQA Tests - action #43502: [sle][functional][y] test fails in yast2_snapperRejected2018-11-072018-12-04

Actions
Has duplicate openQA Tests - action #32956: [sle][functional][u][medium] test fails in yast2_snapper - Improve loggingRejected2018-03-09

Actions
Blocked by openQA Tests - action #38336: [functional][u] Re-enable yast2_snapper on openSUSEResolveddheidler2018-07-102018-10-23

Actions
Actions #1

Updated by riafarov about 6 years ago

So after analysis it's definitely some other issue with same symptoms. Main thing we had before PR#3345 is soft-failure and complicated logic not to run same thing twice. In recent failures we are not even able to log in to collect logs which makes investigation really complex as it's sporadic issue.
Issue is also not related to btrfs balancing activities.
We also cannot log in even with scaled timeout which is 300 seconds (default assert screen 30 seconds scaled by 10). In this cases test would fail as well in select_console.
I suggest increasing timeout factor in y2snapper_failure_analysis even further, to increase chances of getting logs and proceed after that.
I will also trigger many runs on shared workers to see if that helps.

Actions #2

Updated by okurz about 6 years ago

  • Due date set to 2018-02-13
  • Target version set to Milestone 14

Sounds reasonable. Can you do trigger the jobs in parallel to some other tasks? We can check back in next sprint.

Actions #3

Updated by riafarov about 6 years ago

  • Target version deleted (Milestone 14)

Sure. I've submitted PR to increase timeout for logs collection: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/4257 and triggered 30 runs locally. I'll still invest more time in this sprint to analyze results of these local runs, but obviously I won't sit and watch them as some random weird TV show =)

Actions #4

Updated by okurz about 6 years ago

  • Target version set to Milestone 14
Actions #5

Updated by riafarov about 6 years ago

  • Status changed from Workable to Feedback

In 50 local runs could not get yast2_snapper. Let's wait when it fails in production and see if we get logs.

Actions #6

Updated by riafarov about 6 years ago

  • Assignee deleted (riafarov)

No failures in yast2_snapper in 50 runs, PR merged, let's see if it get reproduced. Unassigning myself so can be picked up in the next sprint.

Actions #7

Updated by okurz about 6 years ago

  • Related to action #18346: [sles][functional][yast] yast2_snapper: sporadicly snapshot deletion takes longer - but test is not waiting 240s like in the code anyway added
Actions #8

Updated by JERiveraMoya about 6 years ago

Please, set story complexity.

Actions #9

Updated by okurz about 6 years ago

  • Subject changed from [sle][functional]test fails in yast2_snapper after closing the main GUI window - do we need to bringback the original workaround? to [sle][functional][medium][bsc#1032831]test fails in yast2_snapper after closing the main GUI window - do we need to bringback the original workaround?
  • Due date changed from 2018-02-13 to 2018-03-13
  • Status changed from Feedback to Blocked
  • Assignee set to okurz

https://openqa.suse.de/tests/1424124/file/autoinst-log.txt failed waiting for closed gui after 180 seconds and subsequently failed to login in https://openqa.suse.de/tests/1424124#step/yast2_snapper/53 after more than 10 minutes! That is a product bug, reopening bug in bugzilla as soon as I can find one -> bsc#1032831

Actions #10

Updated by okurz about 6 years ago

  • Due date deleted (2018-03-13)
  • Target version changed from Milestone 14 to future
Actions #11

Updated by okurz about 6 years ago

  • Related to action #32956: [sle][functional][u][medium] test fails in yast2_snapper - Improve logging added
Actions #12

Updated by okurz about 6 years ago

Updated the bug and closed the corresponding SLE15 bug as duplicate and assigned the bug back to the yast team with the proposal to ensure that snapper will not be triggered while a btrfs balance task is running.

Actions #13

Updated by okurz about 6 years ago

  • Related to deleted (action #32956: [sle][functional][u][medium] test fails in yast2_snapper - Improve logging)
Actions #14

Updated by okurz about 6 years ago

  • Has duplicate action #32956: [sle][functional][u][medium] test fails in yast2_snapper - Improve logging added
Actions #15

Updated by okurz about 6 years ago

  • Subject changed from [sle][functional][medium][bsc#1032831]test fails in yast2_snapper after closing the main GUI window - do we need to bringback the original workaround? to [sle][functional][medium][bsc#1032831]test fails in yast2_snapper after closing the main GUI window - do we need to bringback the original workaround? or improve logging
  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)
  • Target version changed from future to Milestone 17

We can revisit in M17 unless triggered by the actions on the bug in before and think about improving logging or debugging if necessary at all. There might be one option which could help even though the SUT looks like completely unresponsive: We could send sysrq-w to show the currently blocked tasks and record a screenshot of that. This would be awesome :)

Actions #16

Updated by okurz almost 6 years ago

  • Subject changed from [sle][functional][medium][bsc#1032831]test fails in yast2_snapper after closing the main GUI window - do we need to bringback the original workaround? or improve logging to [sle][functional][y][medium][bsc#1032831] test fails in yast2_snapper after closing the main GUI window - do we need to bringback the original workaround? or improve logging
  • Target version changed from Milestone 17 to Milestone 19
Actions #17

Updated by okurz almost 6 years ago

  • Target version changed from Milestone 19 to Milestone 19
Actions #18

Updated by riafarov over 5 years ago

Test module is not scheduled for SLE15-SP1 https://openqa.suse.de/tests/2141056

Actions #19

Updated by okurz over 5 years ago

  • Blocked by action #38336: [functional][u] Re-enable yast2_snapper on openSUSE added
Actions #20

Updated by okurz over 5 years ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz
  • Target version changed from Milestone 19 to Milestone 23
Actions #21

Updated by okurz over 5 years ago

  • Related to action #43502: [sle][functional][y] test fails in yast2_snapper added
Actions #22

Updated by okurz over 5 years ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

unblocked

Actions #23

Updated by okurz about 5 years ago

  • Due date set to 2019-03-26
Actions #24

Updated by okurz about 5 years ago

  • Status changed from Workable to Resolved
  • Assignee set to okurz

We can see the module scheduled properly in multiple test scenarios in SLE15SP1 https://openqa.suse.de/tests/overview?arch=&modules=yast2_snapper&distri=sle&build=189.1&version=15-SP1&groupid=110# and pretty stable. Also, in the generic post_fail_hook we trigger "sysrq-w" so in any potential future failures we should already have better feedback.

Actions

Also available in: Atom PDF