Project

General

Profile

Actions

action #121774

open

LTP cgroup test appears to crash OpenQA worker instance

Added by MDoucha over 1 year ago. Updated 7 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2022-12-09
Due date:
% Done:

0%

Estimated time:
Tags:

Description

LTP test cgroup_fj_stress_blkio_4_4_each on latest SLE-15SP1 KOTD kernel appears to crash the OpenQA worker instance it's running on. The test itself will succeed but the OpenQA job will stay stuck in wait_serial() for several hours (despite 90 second timeout) until the whole job fails on MAX_JOB_TIME. There are 3 examples so far:
https://openqa.suse.de/tests/10089424#step/cgroup_fj_stress_blkio_4_4_each/7
https://openqa.suse.de/tests/10111009#step/cgroup_fj_stress_blkio_4_4_each/7
https://openqa.suse.de/tests/10113099#step/cgroup_fj_stress_blkio_4_4_each/7

I've seen this issue only on SLE-15SP1 KOTD builds 156 and 157. I have not seen any cases on other SLE versions.

Typical autoinst-log.txt entries related to the timeout:

[2022-12-06T08:52:27.432374+01:00] [debug] <<< testapi::script_run(cmd="vmstat -w", output="", quiet=undef, timeout=30, die_on_timeout=1)
[2022-12-06T08:52:27.432549+01:00] [debug] tests/kernel/run_ltp.pm:334 called testapi::script_run
[2022-12-06T08:52:27.432710+01:00] [debug] <<< testapi::wait_serial(record_output=undef, regexp="# ", quiet=undef, no_regex=1, buffer_size=undef, expect_not_found=0, timeout=90)
[2022-12-06T10:39:58.278597+01:00] [debug] autotest received signal TERM, saving results of current test before exiting
[2022-12-06T10:39:58.278622+01:00] [debug] isotovideo received signal TERM
[2022-12-06T10:39:58.278748+01:00] [debug] backend got TERM

Expected result

Last good: 4.12.14-150100.156.1.gb6c27ee (or more recent)

Further details

Always latest result in this scenario: latest

Steps to reproduce:

  1. Run ltp_controllers testsuite on SLE-15SP1 KOTD
  2. Wait.
Actions

Also available in: Atom PDF