Project

General

Profile

Actions

action #157447

open

ppc64le-spvm stops at grub page, timeout due to slow PXE traffic between PRG2 and NUE2?

Added by okurz 2 months ago. Updated 5 days ago.

Status:
New
Priority:
High
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2024-03-18
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

From https://suse.slack.com/archives/C02CANHLANP/p1710726999169689

openQA test in scenario sle-15-SP6-Migration-from-SLE15-SPx-ppc64le-ha_migration_online_pscc_sles15sp3_pre@ppc64le-spvm fails in
bootloader
to execute the grub command line or boot the kernel+initrd.
This seems to affect all openQA OSD tests on spvm

Reproducible

Fails since 2024-03-13, see history in grenache-1:1, possibly same in other instances on the same machine

Expected result

Last good: https://openqa.suse.de/tests/13765888 from 5 days ago on grenache:1, also see verification runs from #155521

Further details

Always latest result in this scenario: latest


Related issues 2 (1 open1 closed)

Related to QA - action #139112: Ensure OSD openQA PowerPC machine grenache is operational from PRG2Resolvednicksinger2023-06-29

Actions
Related to openQA Project - action #157432: parted /dev/sda disk got error at powerVM worker size:MWorkable2024-03-18

Actions
Actions #1

Updated by jbaier_cz 2 months ago

  • Related to action #139112: Ensure OSD openQA PowerPC machine grenache is operational from PRG2 added
Actions #2

Updated by jbaier_cz 2 months ago · Edited

As mentioned in #139112#note-13, the process is slow and at least TIMEOUT_SCALE=4 was needed in the original verification job. The failed job in this ticket has TIMEOUT_SCALE=3 which seems to be insufficient.

Actions #3

Updated by okurz 2 months ago · Edited

  • Tags changed from infra, ipxe, PXE, PowerPC, spvm, network, prg2 to ipxe, PXE, PowerPC, spvm, network, prg2
  • Target version deleted (Ready)

From infra perspective we clarified that the current situation is as it is meaning a very slow throughput. Please just bump the according specific timeouts in the test code which should be handled by test code owners. Likely at the location
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/e633d9eb360529939cb47d30fba85a5f9c77ba34/lib/bootloader_pvm.pm#L121

Already the previous use of timeout scale with value 3 is a workaround that should only be used for short time.

In the meantime we will focus on #155524

Actions #4

Updated by tinawang123 about 2 months ago

I tried to set TIMEOUT_SCALE=4 it cannot work: https://openqa.suse.de/tests/13818599#step/bootloader_start/24
I extend the time ' assert_screen "pvm-grub-command-line-fresh-prompt", 1800, no_wait => 1;' at lib/bootloader_pvm.pm file.
Failed job: https://openqa.suse.de/tests/13825504

Actions #5

Updated by slo-gin about 2 months ago

This ticket was set to Urgent priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #6

Updated by openqa_review about 1 month ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ha_migration_online_pscc_sles15sp5_pre
https://openqa.suse.de/tests/13959153#step/bootloader/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions #7

Updated by slo-gin 5 days ago

  • Priority changed from Urgent to High

The ticket will be set to the next lower priority High

Actions #8

Updated by okurz 4 days ago

  • Related to action #157432: parted /dev/sda disk got error at powerVM worker size:M added
Actions

Also available in: Atom PDF