Project

General

Profile

Actions

action #157447

open

ppc64le-spvm stops at grub page, timeout due to slow PXE traffic between PRG2 and NUE2?

Added by okurz about 2 months ago. Updated 18 days ago.

Status:
New
Priority:
Urgent
Assignee:
-
Category:
Bugs in existing tests
Target version:
-
Start date:
2024-03-18
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

From https://suse.slack.com/archives/C02CANHLANP/p1710726999169689

openQA test in scenario sle-15-SP6-Migration-from-SLE15-SPx-ppc64le-ha_migration_online_pscc_sles15sp3_pre@ppc64le-spvm fails in
bootloader
to execute the grub command line or boot the kernel+initrd.
This seems to affect all openQA OSD tests on spvm

Reproducible

Fails since 2024-03-13, see history in grenache-1:1, possibly same in other instances on the same machine

Expected result

Last good: https://openqa.suse.de/tests/13765888 from 5 days ago on grenache:1, also see verification runs from #155521

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to QA - action #139112: Ensure OSD openQA PowerPC machine grenache is operational from PRG2Resolvednicksinger2023-06-29

Actions
Actions #1

Updated by jbaier_cz about 2 months ago

  • Related to action #139112: Ensure OSD openQA PowerPC machine grenache is operational from PRG2 added
Actions #2

Updated by jbaier_cz about 2 months ago · Edited

As mentioned in #139112#note-13, the process is slow and at least TIMEOUT_SCALE=4 was needed in the original verification job. The failed job in this ticket has TIMEOUT_SCALE=3 which seems to be insufficient.

Actions #3

Updated by okurz about 2 months ago · Edited

  • Tags changed from infra, ipxe, PXE, PowerPC, spvm, network, prg2 to ipxe, PXE, PowerPC, spvm, network, prg2
  • Target version deleted (Ready)

From infra perspective we clarified that the current situation is as it is meaning a very slow throughput. Please just bump the according specific timeouts in the test code which should be handled by test code owners. Likely at the location
https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/e633d9eb360529939cb47d30fba85a5f9c77ba34/lib/bootloader_pvm.pm#L121

Already the previous use of timeout scale with value 3 is a workaround that should only be used for short time.

In the meantime we will focus on #155524

Actions #4

Updated by tinawang123 about 2 months ago

I tried to set TIMEOUT_SCALE=4 it cannot work: https://openqa.suse.de/tests/13818599#step/bootloader_start/24
I extend the time ' assert_screen "pvm-grub-command-line-fresh-prompt", 1800, no_wait => 1;' at lib/bootloader_pvm.pm file.
Failed job: https://openqa.suse.de/tests/13825504

Actions #5

Updated by slo-gin about 1 month ago

This ticket was set to Urgent priority but was not updated within the SLO period. Please consider picking up this ticket or just set the ticket to the next lower priority.

Actions #6

Updated by openqa_review 25 days ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: ha_migration_online_pscc_sles15sp5_pre
https://openqa.suse.de/tests/13959153#step/bootloader/1

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released" or "EOL" (End-of-Life)
  3. The bugref in the openQA scenario is removed or replaced, e.g. label:wontfix:boo1234

Expect the next reminder at the earliest in 28 days if nothing changes in this ticket.

Actions

Also available in: Atom PDF