action #17752
closed[sle][sles][functional] kdump tests
Added by RBrownSUSE about 7 years ago. Updated almost 7 years ago.
0%
Description
Kdump is an important tool for diagnosing broken kernels, but it often seems to be broken in all of our codebases
Therefore openQA needs good kdump tests for Tumbleweed, Leap and SLE
These tests would need to
1- Set up kdump on a system (using YaST KDump I guess)
2- Reboot to activate kdump
3- Confirm kdump is running (systemctl status kdump)
4- Trigger a kernel panic ( echo c > /proc/sysrq-trigger )
5- Test that kdump actually loads and takes a dump of the kernel
All of these steps should be relatively easy (if not trivial) for a regular user to do, as they need to be done by any sysadmin when they hit a kernel issue for support. Therefore the automated test should avoid too much fancy logic or tuning - if YaST doesn't pick sane defaults, that's a bug. If kdump doesn't take the dump automatically, that's a bug, etc etc.
Updated by okurz@suse.de about 7 years ago
Isn't that covered with the "crash" test?
Updated by okurz about 7 years ago
- Subject changed from kdump tests to [sle][sles][functional] kdump tests
- Category set to New test
apparently not covered by "crash" test as kdump does not work -> https://bugzilla.suse.com/show_bug.cgi?id=957053
Updated by Anonymous about 7 years ago
- Assignee set to Anonymous
- Start date deleted (
2017-03-15)
Updated by Anonymous almost 7 years ago
The test is now written and tested. It works as expected: kdump is either enabled and logfiles got written (Ref: f146.suse.de/tests/169), or kdump failed to get anabled, and the test fail (Ref: f146.suse.de/tests/170). Now the question is where do we want to have it. I currently put kdump.pm under tests/console/, where another kdump_disabled.pm expects kdump being disabled, which means only one of the both tests can succeed. Another question is, if we should put this kdump test under extratest sle branch or somewhere else. If you got any suggestions, please let me know.
Updated by Anonymous almost 7 years ago
Richard, kdump is already covered by crash test under toolchain. If nobody disagrees, I will reject this ticket.
Updated by RBrownSUSE almost 7 years ago
I disagree. I do not see what kdump has to do with the toolchain test, as kdump is a key part of the basic SLE functionality, and not part of the toolchain module.
I'm glad to hear we already have test code written, but please extract it from the toolchain test so we can have it as a discreete scenario - so we can test it without the presence of the toolchain module.
Updated by Anonymous almost 7 years ago
Hi Richard, I did write and test kdump, but later I saw what it does is already covered by crash test under toolchain. Maybe you want to have it under console instead of toolchain? Btw, what I understand with your "5- Test that kdump actually loads and takes a dump of the kernel" - it actually calls crash or do you mean some other tool? Another thing is, the test fails and succeeds randomly since kdump is not stable, it can randomly be enabled. Do you think the test should fail, or soft fail with a reference to bug number?
Updated by RBrownSUSE almost 7 years ago
I would like to have it under console, yes - and I think I would like it as a separate scenario (requiring a variable like KDUMP=1) so we can have very clear attention to this very important feature
If it fails randomly, then I think that should be handled by the review & tagging process for that scenario. I don't think we should use a soft_fail
Does this make sense?
Updated by okurz almost 7 years ago
yi wrote:
I currently put kdump.pm under tests/console/, where another kdump_disabled.pm expects kdump being disabled, which means only one of the both tests can succeed
As kdump_disabled is only scheduled for the "jeos" product variant where we don't trigger extra_tests that should not be a problem.
RBrownSUSE wrote:
I would like to have it under console, yes
Also makes sense to me. 'toolchain' was initially chosen as a place in gh#os-autoinst/os-autoinst-distri-opensuse#1462 but there is no strong reason for it
and I think I would like it as a separate scenario (requiring a variable like KDUMP=1) so we can have very clear attention to this very important feature
It might be a good idea to just split out the kdump part and run it always before crash. In any case I think kdump and crash are closely related so I don't consider this separation very important. If crash has problems we should also attend these with importance. An alternative could be to just rename the module from "crash" to "kdump" or "kdump_and_crash".
If it fails randomly, then I think that should be handled by the review & tagging process for that scenario. I don't think we should use a soft_fail
I already discussed with yi in person last week and the problem we see is that the module fails not all the times. The review & tagging process is the way to go to find these issues but if they persist for a longer time and especially when they don't happen everytime and the bugref is lost and therefore making a lot of work for the reviewers to find these issues again. Therefore a record_soft_failure makes sense.
In short I recommend the following steps. Each of them should definitely be less work than a day:
- Rename "toolchain/crash.pm" to "console/crash.pm"
- Rename console/crash to console/kdump or console/kdump (after confirmation by rbrown & mnowak) OR - if not acceptible by rbrown or mnowak - pull out the kdump setup part into console/kdump and trigger console/crash with the rest just afterwards
- Add record_soft_failure-steps in the test module(s) for known issues
- Update the existing open bugs with more information based on the test results to expedite the bug resolving process
Updated by Anonymous almost 7 years ago
Just a quick update: now I have merged kdump and crash, having soft failure pointing to the bug, and the test module is named kdump_and_crash under console. I'm testing it on openSUSE, Tumbleweed and SLE. I'll send an email later to mnowak before I do any other changes.
Updated by okurz almost 7 years ago
or just send a PR with the changes and invite mnowak there.
Updated by michalnowak almost 7 years ago
I am fine with Yi's current approach in PR#2904. I'll review there.
Updated by Anonymous almost 7 years ago
Testrun succeeded on SLE: http://openqa.suse.de/tests/958863
and openSUSE Tumbleweed: https://openqa.opensuse.org/tests/408936
Updated by okurz almost 7 years ago
- Related to action #16436: [sles][functional] test fails in crash with timeout on running script added
Updated by Anonymous almost 7 years ago
I think we are done with kdump_and_crash test. It fails now randomly because of kdump itself being unstable.
Updated by Anonymous almost 7 years ago
- Status changed from In Progress to Resolved
Updated by RBrownSUSE almost 7 years ago
yi wrote:
I think we are done with kdump_and_crash test. It fails now randomly because of kdump itself being unstable.
Do we have a bsc# number for kdump being unstable?
Updated by Anonymous almost 7 years ago
I'm not sure. I think there must be, at least a new one because of seg fault, and another one since a long time where Richard also commented. But my bugzilla account was not working until about a week ago, so I didn't do much with bugzilla.
OK, here are they:
https://bugzilla.suse.com/show_bug.cgi?id=1029318
https://bugzilla.suse.com/show_bug.cgi?id=957053
https://bugzilla.opensuse.org/show_bug.cgi?id=1043389