action #17752: [sle][sles][functional] kdump tests - openQA Tests - openSUSE Project Management Tool

Custom queries

All open Feature tests
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QAM
QE tools team - backlog (ready issues)
QE tools team - backlog (w/o infra)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (w/o infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
SLE15 Migration Open Tickets
SLE15 SP1 Migration Open Tickets
SLE15SP3 Migration open ticket
SLE15SP3 Security open ticket
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

Copy link

action #17752

closed

[sle][sles][functional] kdump tests

Added by RBrownSUSE about 7 years ago. Updated almost 7 years ago.

Status:

Resolved

Priority:

Normal

Assignee:

Category:

New test

Target version:

Start date:

Due date:

% Done:

Estimated time:

Difficulty:

Description

Kdump is an important tool for diagnosing broken kernels, but it often seems to be broken in all of our codebases

Therefore openQA needs good kdump tests for Tumbleweed, Leap and SLE

These tests would need to

1- Set up kdump on a system (using YaST KDump I guess)
2- Reboot to activate kdump
3- Confirm kdump is running (systemctl status kdump)
4- Trigger a kernel panic ( echo c > /proc/sysrq-trigger )
5- Test that kdump actually loads and takes a dump of the kernel

All of these steps should be relatively easy (if not trivial) for a regular user to do, as they need to be done by any sysadmin when they hit a kernel issue for support. Therefore the automated test should avoid too much fancy logic or tuning - if YaST doesn't pick sane defaults, that's a bug. If kdump doesn't take the dump automatically, that's a bug, etc etc.

Related issues 1 (0 open — 1 closed)

Related to openQA Tests - action #16436: [sles][functional] test fails in crash with timeout on running script

Resolved

2017-02-03

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by okurz@suse.de about 7 years ago

Isn't that covered with the "crash" test?

Actions

Copy link

Updated by okurz about 7 years ago

Subject changed from kdump tests to [sle][sles][functional] kdump tests
Category set to New test

apparently not covered by "crash" test as kdump does not work -> https://bugzilla.suse.com/show_bug.cgi?id=957053

Actions

Copy link

Updated by Anonymous about 7 years ago

Status changed from New to In Progress

Actions

Copy link

Updated by Anonymous about 7 years ago

Assignee set to Anonymous
Start date deleted (~~2017-03-15~~)

Actions

Copy link

Updated by Anonymous almost 7 years ago

The test is now written and tested. It works as expected: kdump is either enabled and logfiles got written (Ref: f146.suse.de/tests/169), or kdump failed to get anabled, and the test fail (Ref: f146.suse.de/tests/170). Now the question is where do we want to have it. I currently put kdump.pm under tests/console/, where another kdump_disabled.pm expects kdump being disabled, which means only one of the both tests can succeed. Another question is, if we should put this kdump test under extratest sle branch or somewhere else. If you got any suggestions, please let me know.

Actions

Copy link

Updated by Anonymous almost 7 years ago

Richard, kdump is already covered by crash test under toolchain. If nobody disagrees, I will reject this ticket.

Actions

Copy link

Updated by RBrownSUSE almost 7 years ago

I disagree. I do not see what kdump has to do with the toolchain test, as kdump is a key part of the basic SLE functionality, and not part of the toolchain module.

I'm glad to hear we already have test code written, but please extract it from the toolchain test so we can have it as a discreete scenario - so we can test it without the presence of the toolchain module.

Actions

Copy link

Updated by Anonymous almost 7 years ago

Hi Richard, I did write and test kdump, but later I saw what it does is already covered by crash test under toolchain. Maybe you want to have it under console instead of toolchain? Btw, what I understand with your "5- Test that kdump actually loads and takes a dump of the kernel" - it actually calls crash or do you mean some other tool? Another thing is, the test fails and succeeds randomly since kdump is not stable, it can randomly be enabled. Do you think the test should fail, or soft fail with a reference to bug number?

Actions

Copy link

Updated by RBrownSUSE almost 7 years ago

I would like to have it under console, yes - and I think I would like it as a separate scenario (requiring a variable like KDUMP=1) so we can have very clear attention to this very important feature

If it fails randomly, then I think that should be handled by the review & tagging process for that scenario. I don't think we should use a soft_fail

Does this make sense?

Actions

Copy link

#10

Updated by okurz almost 7 years ago

yi wrote:

I currently put kdump.pm under tests/console/, where another kdump_disabled.pm expects kdump being disabled, which means only one of the both tests can succeed

As kdump_disabled is only scheduled for the "jeos" product variant where we don't trigger extra_tests that should not be a problem.

RBrownSUSE wrote:

I would like to have it under console, yes

Also makes sense to me. 'toolchain' was initially chosen as a place in gh#os-autoinst/os-autoinst-distri-opensuse#1462 but there is no strong reason for it

and I think I would like it as a separate scenario (requiring a variable like KDUMP=1) so we can have very clear attention to this very important feature

It might be a good idea to just split out the kdump part and run it always before crash. In any case I think kdump and crash are closely related so I don't consider this separation very important. If crash has problems we should also attend these with importance. An alternative could be to just rename the module from "crash" to "kdump" or "kdump_and_crash".

If it fails randomly, then I think that should be handled by the review & tagging process for that scenario. I don't think we should use a soft_fail

I already discussed with yi in person last week and the problem we see is that the module fails not all the times. The review & tagging process is the way to go to find these issues but if they persist for a longer time and especially when they don't happen everytime and the bugref is lost and therefore making a lot of work for the reviewers to find these issues again. Therefore a record_soft_failure makes sense.

In short I recommend the following steps. Each of them should definitely be less work than a day:

Rename "toolchain/crash.pm" to "console/crash.pm"
Rename console/crash to console/kdump or console/kdump (after confirmation by rbrown & mnowak) OR - if not acceptible by rbrown or mnowak - pull out the kdump setup part into console/kdump and trigger console/crash with the rest just afterwards
Add record_soft_failure-steps in the test module(s) for known issues
Update the existing open bugs with more information based on the test results to expedite the bug resolving process

Actions

Copy link

#11

Updated by Anonymous almost 7 years ago

Just a quick update: now I have merged kdump and crash, having soft failure pointing to the bug, and the test module is named kdump_and_crash under console. I'm testing it on openSUSE, Tumbleweed and SLE. I'll send an email later to mnowak before I do any other changes.

Actions

Copy link

#12

Updated by okurz almost 7 years ago

or just send a PR with the changes and invite mnowak there.

Actions

Copy link

#13

Updated by michalnowak almost 7 years ago

I am fine with Yi's current approach in PR#2904. I'll review there.

Actions

Copy link

#14

Updated by Anonymous almost 7 years ago

Testrun succeeded on SLE: http://openqa.suse.de/tests/958863
and openSUSE Tumbleweed: https://openqa.opensuse.org/tests/408936

Actions

Copy link

#15

Updated by okurz almost 7 years ago

Related to action #16436: [sles][functional] test fails in crash with timeout on running script added

Actions

Copy link

#16

Updated by Anonymous almost 7 years ago

I think we are done with kdump_and_crash test. It fails now randomly because of kdump itself being unstable.

Actions

Copy link

#17

Updated by Anonymous almost 7 years ago

Status changed from In Progress to Resolved

Actions

Copy link

#18

Updated by RBrownSUSE almost 7 years ago

yi wrote:

I think we are done with kdump_and_crash test. It fails now randomly because of kdump itself being unstable.

Do we have a bsc# number for kdump being unstable?

Actions

Copy link

#19

Updated by Anonymous almost 7 years ago

I'm not sure. I think there must be, at least a new one because of seg fault, and another one since a long time where Richard also commented. But my bugzilla account was not working until about a week ago, so I didn't do much with bugzilla.
OK, here are they:
https://bugzilla.suse.com/show_bug.cgi?id=1029318
https://bugzilla.suse.com/show_bug.cgi?id=957053
https://bugzilla.opensuse.org/show_bug.cgi?id=1043389

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA » openQA Project » openQA Tests

Tags

Custom queries

action #17752

[sle][sles][functional] kdump tests

Updated by okurz@suse.de about 7 years ago

Updated by okurz about 7 years ago

Updated by Anonymous about 7 years ago

Updated by Anonymous about 7 years ago

Updated by Anonymous almost 7 years ago

Updated by Anonymous almost 7 years ago

Updated by RBrownSUSE almost 7 years ago

Updated by Anonymous almost 7 years ago

Updated by RBrownSUSE almost 7 years ago

Updated by okurz almost 7 years ago

Updated by Anonymous almost 7 years ago

Updated by okurz almost 7 years ago

Updated by michalnowak almost 7 years ago

Updated by Anonymous almost 7 years ago

Updated by okurz almost 7 years ago

Updated by Anonymous almost 7 years ago

Updated by Anonymous almost 7 years ago

Updated by RBrownSUSE almost 7 years ago

Updated by Anonymous almost 7 years ago