Project

General

Profile

Actions

action #157432

open

parted /dev/sda disk got error at powerVM worker

Added by tinawang123 about 2 months ago. Updated 2 days ago.

Status:
New
Priority:
Low
Assignee:
-
Category:
Support
Target version:
Start date:
2024-03-18
Due date:
% Done:

0%

Estimated time:

Description

Failed job:
https://openqa.suse.de/tests/13768555#step/bootloader_start/39
Reproduce steps:

  1. wipefs -af /dev/sda erased disk successful.
  2. sync
  3. parted -s /dev/sda mklabel gpt Error: Partition(s)2,3 on /dev/sda have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making future changes.

Related issues 1 (1 open0 closed)

Related to openQA Tests - action #157447: ppc64le-spvm stops at grub page, timeout due to slow PXE traffic between PRG2 and NUE2?New2024-03-18

Actions
Actions #1

Updated by okurz about 2 months ago

  • Category set to Support
  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version set to Ready
Actions #2

Updated by okurz about 2 months ago

@tinawang123 I don't think we can anything about that from openQA side. This is quite usual behaviour I have seen in Linux regardless of the architecture and definitely not related to a specific machine. You need to handle that behaviour in test code accordingly.

Actions #3

Updated by JERiveraMoya about 2 months ago

  • Subject changed from [tools] parted /dev/sda disk got error at powerVM worker to parted /dev/sda disk got error at powerVM worker

okurz wrote in #note-2:

@tinawang123 I don't think we can anything about that from openQA side. This is quite usual behaviour I have seen in Linux regardless of the architecture and definitely not related to a specific machine. You need to handle that behaviour in test code accordingly.

The reason to introduce this wiping + partitioning was to make more stable the tests (because the disk was reused with whatever happened before making it unpredictable with sporadic failures), this has been working for ages in powervm and for example in s390x we do something similar, https://openqa.suse.de/tests/13783114#step/bootloader_start/49.
You can compare with what we expect for pvm with old passing job: https://openqa.suse.de/tests/11163215#step/bootloader_start/35.

Is there any other way to have a fresh lpar there? to reboot at that point and handling on the test would be an overkill, this new setup for some reason does't allow to perform that operation. Googling I hit some result regarding potential kernel issues (but not idea...honestly).

Actions #4

Updated by okurz about 2 months ago

JERiveraMoya wrote in #note-3:

Is there any other way to have a fresh lpar there?

Potentially by wiping the LPAR assigned storage from novalink at the beginning of the test execution.
As alternative one could try to force a refresh of storage devices from the Linux system.

to reboot at that point and handling on the test would be an overkill, this new setup for some reason does't allow to perform that operation.

What do you mean with "new setup"? What we have now in PRG2 are the very same machines that were already used in before.

Actions #5

Updated by openqa_review about 2 months ago

  • Due date set to 2024-04-02

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by okurz about 2 months ago

  • Status changed from In Progress to Feedback
Actions #7

Updated by okurz about 2 months ago

  • Due date changed from 2024-04-02 to 2024-04-30
  • Priority changed from Normal to Low
  • Target version changed from Ready to Tools - Next

no response. Following up with lower prio

Actions #8

Updated by okurz about 2 months ago

  • Due date deleted (2024-04-30)
  • Status changed from Feedback to Rejected
  • Target version changed from Tools - Next to Ready

rejecting due to no response

Actions #9

Updated by JERiveraMoya about 2 months ago ยท Edited

Unfortunately, latest build doesn't pass the bootloader to give you any feedback here (sorry for the delay in any case).
Once that happens we might consider your advice there (although technically I don't know how can be done).
The issues most likely will persist, but if you prefer reject it for now we can reopen later, up to you how to handle it.
The other point is that now I know that they are the same machines that have existing issues from years ago, thanks for that info.

Actions #10

Updated by JERiveraMoya 17 days ago

Here is the sporadic issue: https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online&machine=ppc64le-spvm&test=create_hdd_textmode_yast&version=15-SP6#next_previous

What command did you suggest that could be run before the parted to refresh the storage?
For the other suggestion we don't have expertise to remove LPAR assigned storage from novalink.

Can be connected with https://progress.opensuse.org/issues/157447 ?
If you need a new ticket instead of reopening this one, please let us know.

Actions #11

Updated by leli 2 days ago

  • Status changed from Rejected to New

@okurz Reopened this ticket, please help to solve it, we don't have enough knowledge to fix it, refer to Joaquin's comments. Thanks.

Some investigation for the history of the issue https://openqa.suse.de/tests/14289526#next_previous.

All failed case has the parted issue, on grenache-1:2 grenache-1:4, while passed job without the parted issue on other workers(Strange is that I found one passed job on 1:4 also)

Actions #12

Updated by okurz 2 days ago

  • Related to action #157447: ppc64le-spvm stops at grub page, timeout due to slow PXE traffic between PRG2 and NUE2? added
Actions #13

Updated by okurz 2 days ago

  • Assignee deleted (okurz)
Actions

Also available in: Atom PDF