Project

General

Profile

Actions

action #121021

closed

coordination #121876: [epic] Handle openQA review failures in Yam squad - SLE 15 SP5

upgrade_snapshots fails in post_run_hook due to no prompt

Added by hjluo about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2022-11-28
Due date:
% Done:

0%

Estimated time:

Description

Motivation

openQA test in scenario sle-15-SP5-Regression-on-Migration-from-SLE12-SPx-ppc64le-offline_sles12sp5_media_sdk-lp-asmm-contm-lgm-tcm-wsm-pcm_all_full@ppc64le-2g fails in
upgrade_snapshots

Looks like we timout in the post_run_hook in consoletest.pm, but actually the last command, the wait_serial in upgrade_snapshots.pm succeeds, but the last command doesn lead us to a correct prompt so it cannot run the post_run_hook properly.

See correct prompt here in last succesfull: https://openqa.suse.de/tests/10481195#step/upgrade_snapshots/7

Acceptance criteria

AC1: Try to get the prompt to be able to run the post_run_hook

Additional information

As there are little chances that bugs open for Power KVM would be solved we should find some simple solution, like disable part of this scenario or completely in case of not finding a good solution.

Actions #1

Updated by hjluo about 2 years ago

  • Project changed from openQA Tests (public) to qe-yam
  • Category deleted (Bugs in existing tests)
Actions #2

Updated by openqa_review about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp5_media_sdk-lp-asmm-contm-lgm-tcm-wsm-pcm_all_full@ppc64le-2g
https://openqa.suse.de/tests/10092057#step/upgrade_snapshots/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #3

Updated by openqa_review almost 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: offline_sles12sp5_media_sdk-lp-asmm-contm-lgm-tcm-wsm-pcm_all_full@ppc64le-2g
https://openqa.suse.de/tests/10500564#step/upgrade_snapshots/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 56 days if nothing changes in this ticket.

Actions #4

Updated by JERiveraMoya almost 2 years ago

  • Status changed from New to Workable
  • Priority changed from Normal to High
  • Target version set to Current
Actions #5

Updated by JERiveraMoya almost 2 years ago

  • Subject changed from timeout in in upgrade_snapshots to upgrade_snapshots fails in post_run_hook
  • Description updated (diff)
Actions #6

Updated by JERiveraMoya almost 2 years ago

  • Subject changed from upgrade_snapshots fails in post_run_hook to upgrade_snapshots fails in post_run_hook due to no prompt
  • Description updated (diff)
  • Parent task set to #121876
Actions #7

Updated by JERiveraMoya almost 2 years ago

  • Description updated (diff)
Actions #8

Updated by JRivrain almost 2 years ago

  • Status changed from Workable to In Progress
  • Assignee set to JRivrain
Actions #9

Updated by JRivrain almost 2 years ago

Looking at this, I would completely disable this test suite or migrate it to Powervm: in last build 7 test modules are failing due to system being apparently extremely irresponsive. It probably does not make sense fixing this, we already have TIMEOUT_SCALE=2 and trivial commands don't even get typed.

Actions #10

Updated by JRivrain almost 2 years ago

The problem seems to be a kernel failure but we can't report it as the backend is not supported...

Actions #11

Updated by JRivrain almost 2 years ago

  • Status changed from In Progress to Resolved

The test causes an OOM situation. It is a bug, but this backend is no longer supported. Adding memory allows other modules to pass, but is resource consuming, and the test module itself is still failing anyway. I tested the same commands on a powervm machine, the bug was not appearing there. Un-scheduling this module will allow the other test modules to pass.

PR https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/16537

Actions

Also available in: Atom PDF