Project

General

Profile

action #121774

Updated by MDoucha over 1 year ago

LTP test cgroup_fj_stress_blkio_4_4_each on latest SLE-15SP1 KOTD kernel appears to crash the OpenQA worker instance it's running on. The test itself will succeed but the OpenQA job will stay stuck in `wait_serial()` for several hours (despite 90 second timeout) until the whole job fails on MAX_JOB_TIME. There are 3 examples so far: 
 https://openqa.suse.de/tests/10089424#step/cgroup_fj_stress_blkio_4_4_each/7 
 https://openqa.suse.de/tests/10111009#step/cgroup_fj_stress_blkio_4_4_each/7 
 https://openqa.suse.de/tests/10113099#step/cgroup_fj_stress_blkio_4_4_each/7 

 I've seen this issue only on SLE-15SP1 KOTD builds 156 and 157. I have not seen any cases on other SLE versions. 

 Typical autoinst-log.txt entries related to the timeout: 

     [2022-12-06T08:52:27.432374+01:00] [debug] <<< testapi::script_run(cmd="vmstat -w", output="", quiet=undef, timeout=30, die_on_timeout=1) 
     [2022-12-06T08:52:27.432549+01:00] [debug] tests/kernel/run_ltp.pm:334 called testapi::script_run 
     [2022-12-06T08:52:27.432710+01:00] [debug] <<< testapi::wait_serial(record_output=undef, regexp="# ", quiet=undef, no_regex=1, buffer_size=undef, expect_not_found=0, timeout=90) 
     [2022-12-06T10:39:58.278597+01:00] [debug] autotest received signal TERM, saving results of current test before exiting 
     [2022-12-06T10:39:58.278622+01:00] [debug] isotovideo received signal TERM 
     [2022-12-06T10:39:58.278748+01:00] [debug] backend got TERM 

 ## Expected result 

 Last good: [4.12.14-150100.156.1.gb6c27ee](https://openqa.suse.de/tests/10091628) (or more recent) 


 ## Further details 

 Always latest result in this scenario: [latest](https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Incidents-Kernel-KOTD&machine=64bit&test=ltp_controllers&version=15-SP1) 


 ## Steps to reproduce: 

 1. Run `ltp_controllers` testsuite on SLE-15SP1 KOTD 
 2. Wait.

Back