Project

General

Profile

action #98832

[qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot

Added by JERiveraMoya 8 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2021-09-17
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

This scenarios fails since 4 months ago https://openqa.suse.de/tests/7125192#step/rebootmgr/14
I tried to increase the timeout for the selection of the ssh console to 60 with no success.
Stopping the SUT just after reboot and checking state, there is not ping and nmap does not report any machine active there.
I was able to disconnect the console session from openQA and open it in my terminal, and then force a reboot of this special partition. Everything looks aparently ok after the forced reboot. Then I retrieved attached logs (yast logs and full journal).

full_journal.log (78.4 KB) full_journal.log JERiveraMoya, 2021-09-17 12:29
y2logs_reconnecting.tar.bz2 (8.5 MB) y2logs_reconnecting.tar.bz2 JERiveraMoya, 2021-09-17 12:29

Related issues

Related to qe-yast - action #97331: [timebox: 24h] Investigate failure in rebootmgrClosed2021-08-22

Related to openQA Infrastructure - action #109112: Improve os-autoinst sshXtermVt.pm connection error handling (was: "Test died: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host") size:MWorkable2022-03-28

History

#1 Updated by okurz 8 months ago

  • Project changed from openQA Infrastructure to openQA Tests
  • Subject changed from rebootmgr fails in PowerVM reconnecting after reboot to [y][yast][qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot
  • Category set to Bugs in existing tests

hi, I suggest you extend the ticket according to https://progress.opensuse.org/projects/openqav3/wiki#Defects to give others a better chance to understand the context and reproducibility.

I don't see anything obvious. I suggest you use https://github.com/os-autoinst/scripts/blob/master/README.md#auto-review---automatically-detect-known-issues-in-openqa-jobs-label-openqa-jobs-with-ticket-references-and-optionally-retrigger as an early mitigation.

As there are so many other powerVM tests that manage to reboot, e.g. compare to a default test like textmode+role_textmode I am certain it's something specific to the test flow so not for "infrastructure" per se.

The test module in question is https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/c2f723f7d582d10c3daa65a74d8dbca5fc50ac4e/tests/transactional/rebootmgr.pm with maintainer "mkravec" but I don't know if he actually can maintain it right now, so likely more for "QE YaST" or "QE Container & Public Cloud"?
Also, powerVM specific test problems are outside our usual expertise, see https://progress.opensuse.org/projects/qa/wiki/Wiki#Out-of-scope

#2 Updated by JERiveraMoya 8 months ago

  • Related to action #97331: [timebox: 24h] Investigate failure in rebootmgr added

#3 Updated by openqa_review 8 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7280859

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#4 Updated by openqa_review 7 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7429883

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#5 Updated by openqa_review 7 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7553902

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#6 Updated by JERiveraMoya 6 months ago

  • Subject changed from [y][yast][qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot to [qac][container][powerVM] rebootmgr fails in PowerVM reconnecting after reboot

Removing YaST tags, we already did some investigation, checking the system, reverting previous code from Fabian, we would appreciate some help here.

#7 Updated by favogt 6 months ago

The best way to debug this is probably to attach to ipmi while the test is doing the reboot.

#8 Updated by openqa_review 6 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7728256

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#9 Updated by openqa_review 5 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/7805233

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#10 Updated by openqa_review 4 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/8010080

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#11 Updated by openqa_review 3 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/8158066

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

#12 Updated by JERiveraMoya 2 months ago

hi favogt, although the ticket it is assigned for a while to other team, could you describe how to
"attach to ipmi while the test is doing the reboot" in ppc64le pVM, maybe we could take a look with some hints :)

#13 Updated by JERiveraMoya about 2 months ago

  • Related to action #109112: Improve os-autoinst sshXtermVt.pm connection error handling (was: "Test died: Error connecting to <root@redcurrant-4.qa.suse.de>: No route to host") size:M added

#14 Updated by openqa_review about 2 months ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: transactional_server_helper_apps@ppc64le-hmc-single-disk
https://openqa.suse.de/tests/8420937#step/rebootmgr/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Also available in: Atom PDF