Project

General

Profile

Actions

action #61907

closed

[kernel][ppc64le] Update kdump memory size for ppc64le

Added by michel_mno over 4 years ago. Updated about 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2020-01-08
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

test fails in kdump_and_crash Leap15.2 ppc64le

crash is reporting following error with new Leap15.2 ppc64le build 69.1:

crash: invalid kernel virtual address: 9  type: "first vmlist addr"
Errors like the one above typically occur when the kernels and memory source
do not match.  These are the files being used:
KERNEL: /boot/vmlinux-5.3.16-lp152.1-default
DEBUGINFO: /usr/lib/debug/boot/vmlinux-5.3.16-lp152.1-default.debug
DUMPFILE: /var/crash/2020-01-07-18:10/vmcore

build 69.1 (kernel: 5.3.16-lp152.1-default crash: 7.2.6)
https://openqa.opensuse.org/tests/1135914#step/kdump_and_crash/71
This is the first test trial since 50.1 one that passed:
build 50.1 (kernel: 5.3.13-lp152.1-default crash: 7.2.6)
https://openqa.opensuse.org/tests/1109602#step/kdump_and_crash/71

Observation

openQA test in scenario opensuse-15.2-DVD-ppc64le-extra_tests_in_textmode@ppc64le fails in
kdump_and_crash

Test suite description

Maintainer: okurz@suse.de

Mainly console extratest.

Reproducible

Fails since (at least) Build 28.1

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 2 (0 open2 closed)

Related to openQA Tests - action #61082: [functional][u] test fails in kdump_and_crash - The test needs adaptionsResolvedSLindoMansilla2019-12-17

Actions
Related to openQA Tests - action #62267: [sle][Migration][SLE15SP2][Regression] test fails in install_service - kdump lead system crashResolvedhjluo2020-01-19

Actions
Actions #1

Updated by michel_mno over 4 years ago

another test failed with different signature, currently assume this is same issue on
https://openqa.opensuse.org/tests/1141833#step/kdump_and_crash/65 for TW snapshot 20200111

crash: invalid kernel virtual address: 7097ab71e7ab670c  type: "list entry"
SCRIPT_FINISHEDuYP6m-1-
at /usr/lib/os-autoinst/testapi.pm line 1091.
Actions #2

Updated by okurz over 4 years ago

  • Subject changed from test fails in kdump_and_crash Leap15.2 ppc64le to [qam][ppc64le] test fails in kdump_and_crash Leap15.2 ppc64le
  • Assignee set to pcervinka

I don't see where the tests are doing anything wrong, looks more like a product issue to be tracked in bugzilla.opensuse.org for me.

@pcervinka as test module maintainer, can you comment?

Actions #3

Updated by okurz over 4 years ago

  • Subject changed from [qam][ppc64le] test fails in kdump_and_crash Leap15.2 ppc64le to [kernel][qam][ppc64le] test fails in kdump_and_crash Leap15.2 ppc64le
Actions #4

Updated by pcervinka over 4 years ago

  • Related to action #61082: [functional][u] test fails in kdump_and_crash - The test needs adaptions added
Actions #5

Updated by pcervinka over 4 years ago

  • Status changed from New to In Progress

I believe it is result of change which removed memory increase https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9269

Actions #6

Updated by SLindoMansilla over 4 years ago

pcervinka wrote:

I believe it is result of change which removed memory increase https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/9269

I think a new bug is happening.
I had to remove the workaround because, apart from the bug being marked as resolved, each time the workaround was applied the test failed on aarch64 and ppc64le: https://openqa.suse.de/tests/3767851#step/kdump_and_crash/57
The check_screen failed sometimes sporadically, causing the test to not apply the workaround, and it worked: https://openqa.suse.de/tests/3620940#step/kdump_and_crash/36

After removing the workaround, the test worked on the last build: https://progress.opensuse.org/issues/61082#note-6

Actions #7

Updated by pcervinka over 4 years ago

No, you should mark your failure with already created bug https://bugzilla.suse.com/show_bug.cgi?id=1158540 for aarch64, which exactly describes whole situation.
Anyway I will fix current situation for ppc64le.

Actions #8

Updated by pcervinka over 4 years ago

Here is the run for ppc64le with reverted workaround https://openqa.suse.de/tests/3811574, which was successful (as expected).
Workaround is not needed for aarch64, but ppc64le still needs more memory. Also reference in previous workaround was not valid anymore. So I will find better bug for it or create new one which will better reflect reality.

Actions #9

Updated by SLindoMansilla over 4 years ago

pcervinka wrote:

Here is the run for ppc64le with reverted workaround https://openqa.suse.de/tests/3811574, which was successful (as expected).
Workaround is not needed for aarch64, but ppc64le still needs more memory. Also reference in previous workaround was not valid anymore. So I will find better bug for it or create new one which will better reflect reality.

Thanks!
This is what I meant. It is a new bug introduced in the new build. The last build worked only without the workaround: https://openqa.suse.de/tests/3776603#step/kdump_and_crash/69

Actions #10

Updated by pcervinka over 4 years ago

Not exactly new for the build, issue was there all the time masked/solved by the workaround which set kdump memory to 640MB. Can't comment, why your job verification was fine. Kdump in kernel group started to fail just after workaround removal https://openqa.suse.de/tests/3766514#next_previous. But we can say, that history of this workaround is not clear, issue for aarch64 is gone, original bug is closed. I will reintroduce workaround for ppc64le, with better reference (this will be funny part).

Actions #11

Updated by pcervinka over 4 years ago

Also detection of "lower memory" in workaround by needles, was not stable at all and created many junk(working for short time) needles. This must be redone to make logic flow more transparent.

Actions #12

Updated by pcervinka over 4 years ago

Experimental run on spvm with reverted change for ppc64le https://openqa.suse.de/tests/3813095.

Actions #13

Updated by pcervinka over 4 years ago

  • Subject changed from [kernel][qam][ppc64le] test fails in kdump_and_crash Leap15.2 ppc64le to [kernel][ppc64le] Update kdump memory size for ppc64le
Actions #16

Updated by pcervinka about 4 years ago

  • Status changed from In Progress to Feedback

Let's observe couple more runs on osd.

Actions #17

Updated by hjluo about 4 years ago

  • Related to action #62267: [sle][Migration][SLE15SP2][Regression] test fails in install_service - kdump lead system crash added
Actions #18

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/3868582

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #19

Updated by pcervinka about 4 years ago

  • Status changed from Feedback to Resolved

Seems to be OK now.

Actions #20

Updated by okurz about 4 years ago

  • Status changed from Resolved to Feedback

Hi pcervinka, I see the test still failing with this ticket as label: https://openqa.suse.de/tests/3907124#step/kdump_and_crash/69

Actions #21

Updated by pcervinka about 4 years ago

  • Status changed from Feedback to Resolved

I would say incorrect test auto-labeling, it is completely different issue (failure at different step):
[2020-02-21T02:50:39.120 UTC] [debug] output not validating at /var/lib/openqa/cache/openqa.suse.de/tests/sle/lib/kdump_utils.pm line 278.

Please, create new poo and put it on me. Thank you!

Actions #22

Updated by okurz about 4 years ago

sorry if my intentions weren't clear. I just crosschecked the existing reports. You could simply delete the automatically carried over comment from the referenced test scenario to prevent the false-match. I did that now and will leave it to the next test reviewer to investigate in detail what the specific issue is.

Actions #23

Updated by pcervinka about 4 years ago

No problem.. And when looking further on that kdump fail in toolchain_zypper, it was even wrongly marked with this poo since beginning.

Actions #24

Updated by SLindoMansilla about 4 years ago

I created a new ticket for aarch64: #63772

Actions #25

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/3973817

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #26

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/3973817

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #27

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/4024341

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #28

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: toolchain_zypper
https://openqa.suse.de/tests/4024341

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions

Also available in: Atom PDF