Project

General

Profile

Actions

action #33199

closed

[sle][functional][s390x][zkvm][u][hard] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed?

Added by okurz almost 7 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Bugs in existing tests
Start date:
2018-03-13
Due date:
2018-04-10
% Done:

0%

Estimated time:
Difficulty:
hard

Description

Observation

openQA test in scenario sle-12-SP4-Server-DVD-s390x-toolchain_zypper@zkvm fails in
kdump_and_crash

Reproducible

Fails since (at least) Build 0234 (current job)

Expected result

Last good: 0233 (or more recent)

Further details

Always latest result in this scenario: latest


Related issues 4 (0 open4 closed)

Related to openQA Tests (public) - action #33616: [sle][functional][sle15][u][hard][xen] test fails in kdump_and_crash - because reboot doesn't workRejectedzluo2018-03-21

Actions
Related to openQA Tests (public) - action #33376: [sle][functional][ppc64le][easy][u] test fails in kdump_and_crash - kdumptool gets killed by OOMResolvedzluo2018-03-16

Actions
Related to openQA Project (public) - action #34003: [tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. sshResolvedcoolo2018-03-29

Actions
Related to openQA Tests (public) - action #33202: [sle][functional][s390x][zkvm][u][hard] test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it)Resolvedmgriessmeier2018-03-132018-08-14

Actions
Actions #1

Updated by nicksinger almost 7 years ago

  • Subject changed from [sle][functional][12sp4][s390x][zkvm]test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed? to [sle][functional][s390x][zkvm] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed?

Also happens in SLES15 now: https://openqa.suse.de/tests/1549296#step/kdump_and_crash/34
From what I saw the test is able to start yast2 kdump but then fails to type after yast closes again. The test fail gets then triggered by power_action but my suspicion is that the actual problems happens one step earlier in do_kdump.

Actions #2

Updated by mgriessmeier almost 7 years ago

Actions #3

Updated by okurz almost 7 years ago

  • Subject changed from [sle][functional][s390x][zkvm] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed? to [sle][functional][s390x][zkvm][u][fast] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed?
  • Due date changed from 2018-04-10 to 2018-03-27
  • Priority changed from Normal to High

ok, let's handle it in this sprint now because we assume most likely it's a recent test regression introduced by @mgriessmeier.

Actions #4

Updated by mgriessmeier almost 7 years ago

  • Subject changed from [sle][functional][s390x][zkvm][u][fast] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed? to [sle][functional][s390x][zkvm][fast] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed?

as it could have been introduced recently, we should have a look already in this sprint

Actions #5

Updated by okurz almost 7 years ago

  • Related to action #33616: [sle][functional][sle15][u][hard][xen] test fails in kdump_and_crash - because reboot doesn't work added
Actions #6

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-03-27 to 2018-04-10
Actions #7

Updated by riafarov over 6 years ago

  • Related to action #33376: [sle][functional][ppc64le][easy][u] test fails in kdump_and_crash - kdumptool gets killed by OOM added
Actions #8

Updated by riafarov over 6 years ago

  • Subject changed from [sle][functional][s390x][zkvm][fast] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed? to [sle][functional][s390x][zkvm][fast][u] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed?
  • Status changed from New to Workable
Actions #9

Updated by okurz over 6 years ago

  • Subject changed from [sle][functional][s390x][zkvm][fast][u] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed? to [sle][functional][s390x][zkvm][u][hard] test fails in kdump_and_crash - system does not shutdown or reboot? what is happening? better output needed?
Actions #10

Updated by cwh over 6 years ago

  • Difficulty set to hard
Actions #11

Updated by okurz over 6 years ago

  • Related to action #34003: [tools] Better logging and error handling in case of remote console connections in consoles or backends, e.g. ssh added
Actions #12

Updated by okurz over 6 years ago

  • Related to action #33202: [sle][functional][s390x][zkvm][u][hard] test fails in boot_to_desktop - still insufficient error reporting, black screen with mouse cursor - we all hate it (was: I hate it) added
Actions #13

Updated by mgriessmeier over 6 years ago

  • Status changed from Workable to In Progress
Actions #14

Updated by mgriessmeier over 6 years ago

afaics, the problem was introduced with my change regarding more robust reconnect -> PR

I've added the execution of assert_shutdown_and_restore_system to S390-KVM:

-    if (check_var('VIRSH_VMM_FAMILY', 'xen')) {
+    if (check_var('VIRSH_VMM_FAMILY', 'xen') || get_var('S390_ZKVM')) {
         assert_shutdown_and_restore_system($action, $shutdown_timeout);
     }

Here I've put in the logic of the former redefine_svirt_domain, which is modifying the xml file

 491         if (check_var('ARCH', 's390x') or get_var('NETBOOT')) {
 492             $svirt->change_domain_element(os => initrd  => undef);
 493             $svirt->change_domain_element(os => kernel  => undef);
 494             $svirt->change_domain_element(os => cmdline => undef);
 495             $svirt->change_domain_element(on_reboot => undef);
 496             $svirt->define_and_start;
 497         }

On initial definition in the xml files, the on_reboot flag is set to destroy, because we need to remodify the xml file - after this we set it to undef

Since we are booting an existing image in the toolchain testsuite, this parameter is by default set to reboot.

The assert_shutdown function which is called afterwards, tries to grep for shut off of the machine, which obviously fails, because the machine was rebooting just normally.

So my guess is, that this issue will occur on all tests which are rebooting again during the testsuite execution. As we don't have that many, and those we have failed earlier, this wasn't (yet) spotted there.

I try to fix this issue by adding another logical statement to that call... or I find a better solution to handle those reboots

Actions #15

Updated by mgriessmeier over 6 years ago

  • Status changed from In Progress to Feedback
  • Assignee set to mgriessmeier

This PR fixes at least this particular issue

Actions #16

Updated by mgriessmeier over 6 years ago

  • Status changed from Feedback to Resolved
Actions

Also available in: Atom PDF