Project

General

Profile

Actions

action #98832

open

[qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot

Added by JERiveraMoya over 2 years ago. Updated 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2021-09-17
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

This scenarios fails since 4 months ago https://openqa.suse.de/tests/7125192#step/rebootmgr/14
I tried to increase the timeout for the selection of the ssh console to 60 with no success.
Stopping the SUT just after reboot and checking state, there is not ping and nmap does not report any machine active there.
I was able to disconnect the console session from openQA and open it in my terminal, and then force a reboot of this special partition. Everything looks aparently ok after the forced reboot. Then I retrieved attached logs (yast logs and full journal).


Files

full_journal.log (78.4 KB) full_journal.log JERiveraMoya, 2021-09-17 12:29
y2logs_reconnecting.tar.bz2 (8.5 MB) y2logs_reconnecting.tar.bz2 JERiveraMoya, 2021-09-17 12:29

Related issues 2 (1 open1 closed)

Related to qe-yam - action #97331: [timebox: 24h] Investigate failure in rebootmgrClosedJERiveraMoya2021-08-22

Actions
Related to openQA Infrastructure - action #109112: Improve os-autoinst sshXtermVt.pm connection error handling (was: "Test died: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host") size:MWorkable2022-03-28

Actions
Actions #1

Updated by okurz over 2 years ago

  • Project changed from openQA Infrastructure to openQA Tests
  • Subject changed from rebootmgr fails in PowerVM reconnecting after reboot to [y][yast][qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot
  • Category set to Bugs in existing tests

hi, I suggest you extend the ticket according to https://progress.opensuse.org/projects/openqav3/wiki#Defects to give others a better chance to understand the context and reproducibility.

I don't see anything obvious. I suggest you use https://github.com/os-autoinst/scripts/blob/master/README.md#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger as an early mitigation.

As there are so many other powerVM tests that manage to reboot, e.g. compare to a default test like textmode+role_textmode I am certain it's something specific to the test flow so not for "infrastructure" per se.

The test module in question is https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/c2f723f7d582d10c3daa65a74d8dbca5fc50ac4e/tests/transactional/rebootmgr.pm with maintainer "mkravec" but I don't know if he actually can maintain it right now, so likely more for "QE YaST" or "QE Container & Public Cloud"?
Also, powerVM specific test problems are outside our usual expertise, see https://progress.opensuse.org/projects/qa/wiki/Wiki#Out-of-scope

Actions #2

Updated by JERiveraMoya over 2 years ago

  • Related to action #97331: [timebox: 24h] Investigate failure in rebootmgr added
Actions #3

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7280859

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #4

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7429883

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #5

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7553902

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #6

Updated by JERiveraMoya over 2 years ago

  • Subject changed from [y][yast][qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot to [qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot

Removing YaST tags, we already did some investigation, checking the system, reverting previous code from Fabian, we would appreciate some help here.

Actions #7

Updated by favogt over 2 years ago

The best way to debug this is probably to attach to ipmi while the test is doing the reboot.

Actions #8

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7728256

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #9

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7805233

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #10

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/8010080

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #11

Updated by openqa_review over 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/8158066

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234
Actions #12

Updated by JERiveraMoya about 2 years ago

hi @favogt, although the ticket it is assigned for a while to other team, could you describe how to
"attach to ipmi while the test is doing the reboot" in ppc64le pVM, maybe we could take a look with some hints :)

Actions #13

Updated by JERiveraMoya about 2 years ago

  • Related to action #109112: Improve os-autoinst sshXtermVt.pm connection error handling (was: "Test died: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host") size:M added
Actions #14

Updated by openqa_review about 2 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/8420937#step/rebootmgr/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #15

Updated by slo-gin 2 months ago

This ticket was set to Normal priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions

Also available in: Atom PDF