Project

General

Profile

action #125783

Updated by okurz about 1 year ago

## Observation 

 openQA test in scenario sle-12-SP5-JeOS-for-kvm-and-xen-Updates-x86_64-jeos-extratest@svirt-xen-hvm fails in 
 [kdump_and_crash](https://openqa.suse.de/tests/10650309/modules/kdump_and_crash/steps/150) 

 ## Test suite description 

 The test is failing since the worker has been updated from SLE 15sp2 to Leap 15.4. 

 ## Reproducible 

 Fails since (at least) Build [20230307-1](https://openqa.suse.de/tests/10637185) 


 ## Expected result 

 Last good: [20230306-1](https://openqa.suse.de/tests/10630886) (or more recent) 


 ## Hypotheses 
 * *H1:* `crashkernel` command line options on dom0 host due to salt changes cause the regression 
 * *H2:* SLE15-SP2->SLE15-SP4 cause the regression 
 * *H3:* SLE15-SP2->Leap 15.4 cause the regression 
 * *H4:* Problem in product SUT OS version -> crosscheck with other versions 

 ## Obstacles 
 1. Having no video is making investigation easier, why that choice of MAX_JOB_TIME=10800 which is disabling video? 
 1. `openqa-investigate` jobs would have helped to tell if test or product diffs make a difference. IMHO you wasted couple of hours due to the wish of QE Kernel to have the job_done_hooks disabled for the kernel tests in particular 
 1. the svirt backend makes investigation a lot harder. https://openqa.suse.de/tests/10630883/logfile?filename=serial0.txt from an x86_64 qemu job show that there is a line "[      5.058618] reboot: Restarting system" likely coming from the crashkernel so we know that the crashkernel was actually properly running and doing something assumed useful for 5s. For svirt-xen we don't have that line in the case of "last good" so we don't even know if the crashkernel was running properly, we just see the effect in the dump files when we recreate a new VM using the same disk image and boot it again 
 1. Tests like https://openqa.suse.de/tests/10654962# sle-15-SP4-JeOS-for-kvm-and-xen-Updates-x86_64-Build20230309-1-jeos-extratest@uefi-virtio-vga don't include kdump so I can't use that for crosschecking. Why is kdump_and_crash not enabled on SLE15-SP4? 
 1. On Tumbleweed the kdump_and_crash module in https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=extra_tests_textmode&version=Tumbleweed , latest https://openqa.opensuse.org/tests/3166236, is apparently affected by a bug denying proper operation since 2021 (!) due to https://bugzilla.opensuse.org/show_bug.cgi?id=1190434 

 ## Suggestions 
 1. If seen critical we can rollback the OS installation and ask someone to crosscheck with different hypervisor OS versions before upgrading again 
 2. We keep openqaw5-xen as is and skip the test module kdump_and_crash until anyone finds a better solution 

 I would go with 2. and also wait for https://bugzilla.opensuse.org/show_bug.cgi?id=1190434 

 ## Further details 

 Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=JeOS-for-kvm-and-xen-Updates&machine=svirt-xen-hvm&test=jeos-extratest&version=12-SP5) 

Back