Project

General

Profile

Actions

action #125783

closed

[jeos] Test fails in kdump_and_crash on SLE 12sp5 and 15sp4 XEN after worker migration from SLES to Leap 15.4

Added by pdostal about 1 year ago. Updated 11 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
Start date:
2023-03-10
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-12-SP5-JeOS-for-kvm-and-xen-Updates-x86_64-jeos-extratest@svirt-xen-hvm fails in
kdump_and_crash

Test suite description

The test is failing since the worker has been updated from SLE 15sp2 to Leap 15.4.

Reproducible

Fails since (at least) Build 20230307-1

Expected result

Last good: 20230306-1 (or more recent)

Hypotheses

  • H1: crashkernel command line options on dom0 host due to salt changes cause the regression
  • H2: SLE15-SP2->SLE15-SP4 cause the regression
  • H3: SLE15-SP2->Leap 15.4 cause the regression
  • H4: Problem in product SUT OS version -> crosscheck with other versions

Obstacles

  1. Having no video is making investigation easier, why that choice of MAX_JOB_TIME=10800 which is disabling video?
  2. openqa-investigate jobs would have helped to tell if test or product diffs make a difference. IMHO you wasted couple of hours due to the wish of QE Kernel to have the job_done_hooks disabled for the kernel tests in particular
  3. the svirt backend makes investigation a lot harder. https://openqa.suse.de/tests/10630883/logfile?filename=serial0.txt from an x86_64 qemu job show that there is a line "[ 5.058618] reboot: Restarting system" likely coming from the crashkernel so we know that the crashkernel was actually properly running and doing something assumed useful for 5s. For svirt-xen we don't have that line in the case of "last good" so we don't even know if the crashkernel was running properly, we just see the effect in the dump files when we recreate a new VM using the same disk image and boot it again
  4. Tests like https://openqa.suse.de/tests/10654962# sle-15-SP4-JeOS-for-kvm-and-xen-Updates-x86_64-Build20230309-1-jeos-extratest@uefi-virtio-vga don't include kdump so I can't use that for crosschecking. Why is kdump_and_crash not enabled on SLE15-SP4?
  5. On Tumbleweed the kdump_and_crash module in https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_textmode&version=Tumbleweed , latest https://openqa.opensuse.org/tests/3166236, is apparently affected by a bug denying proper operation since 2021 (!) due to https://bugzilla.opensuse.org/show_bug.cgi?id=1190434

Suggestions

  1. If seen critical we can rollback the OS installation and ask someone to crosscheck with different hypervisor OS versions before upgrading again
  2. We keep openqaw5-xen as is and skip the test module kdump_and_crash until anyone finds a better solution

I would go with 2. and also wait for https://bugzilla.opensuse.org/show_bug.cgi?id=1190434

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #116644: [qe-core][functional][sle15sp5]test fails in bootloader_svirt, the test is using different network bridge 'ovs-system' rather than 'br0'Resolvedrfan12022-09-16

Actions
Related to openQA Tests - action #126647: [qe-core] test fails in bootloader_start - we should use br0 not ovs-systemResolvedrfan12023-03-27

Actions
Actions

Also available in: Atom PDF