Project

General

Profile

Actions

action #69432

closed

test fails with no module details after boot_ltp, broken run-time scheduling?

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Bugs in existing tests
Start date:
2020-07-29
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-ltp_aio_stress_part1@64bit fails after
boot_ltp
with no details after that. mdoucha suspects

<okurz> do you think the os-autoinst change "Avoid updating last_good if there is no possible user of it" can explain the ltp failures?
<mdoucha> Possible but unlikely. KLP tests create snapshot as well but work fine. https://openqa.suse.de/tests/4500234
<mdoucha> This looks like broken run-time scheduling. Modules added after VM start are ignored by os-autoinst
<okurz> o3 shows the same problems since 5 days, nobody seems to have realized that: https://openqa.opensuse.org/tests/1340472
<mdoucha> good, that narrows it down to 2 or 3 days, not a full week

Reproducible

Fails since Build 20200721 (current job)

Expected result

Last good: 20200720

Further details

Always latest result in this scenario: latest


Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #52673: os-autoinst: Do not save "lastgood" snapshot on last module unless img is preserved with snapshot (e.g. --no-cleanup)Resolvedfavogt2019-06-06

Actions
Actions #1

Updated by okurz over 4 years ago

Comparing os-autoinst versions I see:

$ git log1 --no-merges dc25ddd8..7963b3d4
ef154996 Avoid updating last_good if there is no possible user of it
98de5809 Simplify runalltests in autotest.pm
e6593f21 Simplify passing test list in tools/invoke-tests
ce0023a1 Fix link to architecture documentation
1eaf6e49 Improve build instructions in README, mainly to cover CMake
990c8f62 CMake: Tweak test execution
d4ffa525 Improve argument parsing and source directory handling in tools/invoke-tests
a32956f9 CMake: Add targets for computing test coverage
54ace987 CMake: Add targets for invoking tests
092821da CMake: Add target for updating dependencies
cf2c737f (okurz/feature/base_os) docker: Bump base OS version to Leap 15.2

I consider as likely candidates:

ef154996 Avoid updating last_good if there is no possible user of it
98de5809 Simplify runalltests in autotest.pm
e6593f21 Simplify passing test list in tools/invoke-tests

for a test scenario when we for example partially revert we could pick any of these failing ltp test cases which are also fast to run, e.g. https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-Incidents-Kernel&machine=64bit&test=ltp_input&version=12-SP4

At first for reproduction:

openqa-clone-job --within-instance https://openqa.suse.de --skip-chained-deps 4500252 WORKER_CLASS=openqaworker5 TEST=okurz_poo69432_ltp_input _GROUP=0 BUILD=X

Created job #4500617: sle-12-SP4-Server-DVD-Incidents-Kernel-x86_64-Build:15909:kernel-ec2-ltp_input@64bit -> https://openqa.suse.de/t4500617

Created revert https://github.com/os-autoinst/os-autoinst/pull/1490 and applied hotfix on openqaworker5:

curl -s https://raw.githubusercontent.com/os-autoinst/os-autoinst/revert-1483-snapoptim/autotest.pm > /usr/lib/os-autoinst/autotest.pm

triggered new test, passed. Hotpatched all osd workers with salt on osd:

sudo salt -l error --state-output=changes -C 'G@roles:worker' cmd.run 'curl -s https://raw.githubusercontent.com/os-autoinst/os-autoinst/revert-1483-snapoptim/autotest.pm > /usr/lib/os-autoinst/autotest.pm'

mdoucha will retrigger.

Actions #2

Updated by MDoucha over 4 years ago

The bug is caused specifically by this change in autotest.pm:

-    for my $t (@testorder) {
+    for my $testindex (0 .. $#testorder) {
+        my $t        = $testorder[$testindex];

If @testorder changes during VM runtime, the index sequence will not be updated and the newly added test modules will be ignored.

I recommend using ltp_math for debugging, it's the fastest LTP job that uses run-time scheduling.

Actions #3

Updated by okurz over 4 years ago

  • Related to action #52673: os-autoinst: Do not save "lastgood" snapshot on last module unless img is preserved with snapshot (e.g. --no-cleanup) added
Actions #4

Updated by okurz over 4 years ago

  • Status changed from In Progress to Resolved

@favogt has provided his original PR with the fix in https://github.com/os-autoinst/os-autoinst/pull/1492 . I confirmed it working fine on openqaworker5 with https://openqa.suse.de/t4501133

The current state on osd is ok again and also git master is fixed with the revert. The merge and verification of the changes by favogt are left for #52673

Actions

Also available in: Atom PDF