Project

General

Profile

Actions

action #17752

closed

[sle][sles][functional] kdump tests

Added by RBrownSUSE about 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
New test
Target version:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Kdump is an important tool for diagnosing broken kernels, but it often seems to be broken in all of our codebases

Therefore openQA needs good kdump tests for Tumbleweed, Leap and SLE

These tests would need to

1- Set up kdump on a system (using YaST KDump I guess)
2- Reboot to activate kdump
3- Confirm kdump is running (systemctl status kdump)
4- Trigger a kernel panic ( echo c > /proc/sysrq-trigger )
5- Test that kdump actually loads and takes a dump of the kernel

All of these steps should be relatively easy (if not trivial) for a regular user to do, as they need to be done by any sysadmin when they hit a kernel issue for support. Therefore the automated test should avoid too much fancy logic or tuning - if YaST doesn't pick sane defaults, that's a bug. If kdump doesn't take the dump automatically, that's a bug, etc etc.


Related issues 1 (0 open1 closed)

Related to openQA Tests - action #16436: [sles][functional] test fails in crash with timeout on running scriptResolved2017-02-03

Actions
Actions #1

Updated by okurz@suse.de about 7 years ago

Isn't that covered with the "crash" test?

Actions #2

Updated by okurz about 7 years ago

  • Subject changed from kdump tests to [sle][sles][functional] kdump tests
  • Category set to New test

apparently not covered by "crash" test as kdump does not work -> https://bugzilla.suse.com/show_bug.cgi?id=957053

Actions #3

Updated by Anonymous almost 7 years ago

  • Status changed from New to In Progress
Actions #4

Updated by Anonymous almost 7 years ago

  • Assignee set to Anonymous
  • Start date deleted (2017-03-15)
Actions #5

Updated by Anonymous almost 7 years ago

The test is now written and tested. It works as expected: kdump is either enabled and logfiles got written (Ref: f146.suse.de/tests/169), or kdump failed to get anabled, and the test fail (Ref: f146.suse.de/tests/170). Now the question is where do we want to have it. I currently put kdump.pm under tests/console/, where another kdump_disabled.pm expects kdump being disabled, which means only one of the both tests can succeed. Another question is, if we should put this kdump test under extratest sle branch or somewhere else. If you got any suggestions, please let me know.

Actions #6

Updated by Anonymous almost 7 years ago

Richard, kdump is already covered by crash test under toolchain. If nobody disagrees, I will reject this ticket.

Actions #7

Updated by RBrownSUSE almost 7 years ago

I disagree. I do not see what kdump has to do with the toolchain test, as kdump is a key part of the basic SLE functionality, and not part of the toolchain module.

I'm glad to hear we already have test code written, but please extract it from the toolchain test so we can have it as a discreete scenario - so we can test it without the presence of the toolchain module.

Actions #8

Updated by Anonymous almost 7 years ago

Hi Richard, I did write and test kdump, but later I saw what it does is already covered by crash test under toolchain. Maybe you want to have it under console instead of toolchain? Btw, what I understand with your "5- Test that kdump actually loads and takes a dump of the kernel" - it actually calls crash or do you mean some other tool? Another thing is, the test fails and succeeds randomly since kdump is not stable, it can randomly be enabled. Do you think the test should fail, or soft fail with a reference to bug number?

Actions #9

Updated by RBrownSUSE almost 7 years ago

I would like to have it under console, yes - and I think I would like it as a separate scenario (requiring a variable like KDUMP=1) so we can have very clear attention to this very important feature

If it fails randomly, then I think that should be handled by the review & tagging process for that scenario. I don't think we should use a soft_fail

Does this make sense?

Actions #10

Updated by okurz almost 7 years ago

yi wrote:

I currently put kdump.pm under tests/console/, where another kdump_disabled.pm expects kdump being disabled, which means only one of the both tests can succeed

As kdump_disabled is only scheduled for the "jeos" product variant where we don't trigger extra_tests that should not be a problem.

RBrownSUSE wrote:

I would like to have it under console, yes

Also makes sense to me. 'toolchain' was initially chosen as a place in gh#os-autoinst/os-autoinst-distri-opensuse#1462 but there is no strong reason for it

and I think I would like it as a separate scenario (requiring a variable like KDUMP=1) so we can have very clear attention to this very important feature

It might be a good idea to just split out the kdump part and run it always before crash. In any case I think kdump and crash are closely related so I don't consider this separation very important. If crash has problems we should also attend these with importance. An alternative could be to just rename the module from "crash" to "kdump" or "kdump_and_crash".

If it fails randomly, then I think that should be handled by the review & tagging process for that scenario. I don't think we should use a soft_fail

I already discussed with yi in person last week and the problem we see is that the module fails not all the times. The review & tagging process is the way to go to find these issues but if they persist for a longer time and especially when they don't happen everytime and the bugref is lost and therefore making a lot of work for the reviewers to find these issues again. Therefore a record_soft_failure makes sense.

In short I recommend the following steps. Each of them should definitely be less work than a day:

  • Rename "toolchain/crash.pm" to "console/crash.pm"
  • Rename console/crash to console/kdump or console/kdump (after confirmation by rbrown & mnowak) OR - if not acceptible by rbrown or mnowak - pull out the kdump setup part into console/kdump and trigger console/crash with the rest just afterwards
  • Add record_soft_failure-steps in the test module(s) for known issues
  • Update the existing open bugs with more information based on the test results to expedite the bug resolving process
Actions #11

Updated by Anonymous almost 7 years ago

Just a quick update: now I have merged kdump and crash, having soft failure pointing to the bug, and the test module is named kdump_and_crash under console. I'm testing it on openSUSE, Tumbleweed and SLE. I'll send an email later to mnowak before I do any other changes.

Actions #12

Updated by okurz almost 7 years ago

or just send a PR with the changes and invite mnowak there.

Actions #13

Updated by michalnowak almost 7 years ago

I am fine with Yi's current approach in PR#2904. I'll review there.

Actions #14

Updated by Anonymous almost 7 years ago

Testrun succeeded on SLE: http://openqa.suse.de/tests/958863
and openSUSE Tumbleweed: https://openqa.opensuse.org/tests/408936

Actions #15

Updated by okurz almost 7 years ago

  • Related to action #16436: [sles][functional] test fails in crash with timeout on running script added
Actions #16

Updated by Anonymous over 6 years ago

I think we are done with kdump_and_crash test. It fails now randomly because of kdump itself being unstable.

Actions #17

Updated by Anonymous over 6 years ago

  • Status changed from In Progress to Resolved
Actions #18

Updated by RBrownSUSE over 6 years ago

yi wrote:

I think we are done with kdump_and_crash test. It fails now randomly because of kdump itself being unstable.

Do we have a bsc# number for kdump being unstable?

Actions #19

Updated by Anonymous over 6 years ago

I'm not sure. I think there must be, at least a new one because of seg fault, and another one since a long time where Richard also commented. But my bugzilla account was not working until about a week ago, so I didn't do much with bugzilla.
OK, here are they:
https://bugzilla.suse.com/show_bug.cgi?id=1029318
https://bugzilla.suse.com/show_bug.cgi?id=957053
https://bugzilla.opensuse.org/show_bug.cgi?id=1043389

Actions

Also available in: Atom PDF