Project

General

Profile

Actions

action #36279

closed

[sle][functional][u][s390x][medium] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method

Added by zluo over 6 years ago. Updated about 6 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
New test
Target version:
SUSE QA (private) - Milestone 19
Start date:
2018-05-16
Due date:
2018-10-09
% Done:

0%

Estimated time:
5.00 h
Difficulty:
medium

Description

Observation

openQA test in scenario sle-12-SP4-Server-DVD-s390x-lvm-encrypt-separate-boot@zkvm fails in
reboot_gnome

found in autoinst log:

...
[2018-05-16T05:22:47.0172 CEST] [debug] Command executed: ! virsh dominfo openQA-SUT-2 | grep -w 'shut off', ret=0
[2018-05-16T05:22:48.0265 CEST] [debug] Connection to root@s390pb.suse.de established
[2018-05-16T05:22:48.0445 CEST] [debug] Command executed: ! virsh dominfo openQA-SUT-2 | grep -w 'shut off', ret=0
[2018-05-16T05:22:49.0565 CEST] [debug] # Test died: Machine didn't shut down! at /var/lib/openqa/cache/tests/sle/lib/utils.pm line 486.

--
Actually we don't know on openQA side what exactly happened with shutdown on zkvm host. For me this is similar issue on pkvm with reboot or shutdown.
It takes a long time or it hangs forever.
We can try to give more time for reboot/shutdown at least. Or try to check host status.

Acceptance criteria

  • AC1: Above mentioned scenario on zkvm or suse-kvm shuts down in a stable way

Reproducible

Fails since (at least) Build 0232

Expected result

Reference: SLE 12 SP3 GM

Further details

Always latest result in this scenario: latest , latest zkvm (outdated)


Related issues 2 (0 open2 closed)

Related to openQA Tests (public) - action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn"Resolvedokurz2018-05-142018-07-31

Actions
Blocked by openQA Tests (public) - action #36268: [sle][functional][u][hard][s390x] test fails in consoletest_finish - screen does not show desktop sessionResolvedoorlov2018-05-162018-09-11

Actions
Actions #1

Updated by okurz over 6 years ago

  • Target version set to Milestone 17
Actions #2

Updated by okurz over 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: default
https://openqa.suse.de/tests/1706798

Actions #3

Updated by okurz over 6 years ago

  • Related to action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn" added
Actions #4

Updated by okurz over 6 years ago

  • Description updated (diff)
  • Due date set to 2018-07-31
  • Status changed from New to Workable
  • Target version changed from Milestone 17 to Milestone 18
Actions #5

Updated by okurz over 6 years ago

  • Target version changed from Milestone 18 to Milestone 18
Actions #6

Updated by okurz over 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: default
https://openqa.suse.de/tests/1795627

Actions #7

Updated by okurz over 6 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: lvm-full-encrypt
https://openqa.suse.de/tests/1825212

Actions #8

Updated by okurz over 6 years ago

recent failure in sle-12-SP4-Server-DVD-s390x-Build0286-lvm-encrypt-separate-boot@s390x-kvm-sle12 -> https://openqa.suse.de/tests/1825231#step/reboot_gnome/22

Actions #9

Updated by okurz over 6 years ago

  • Priority changed from Normal to High
Actions #10

Updated by okurz over 6 years ago

  • Description updated (diff)
Actions #11

Updated by okurz over 6 years ago

  • Description updated (diff)
Actions #12

Updated by okurz over 6 years ago

  • Description updated (diff)
Actions #13

Updated by mgriessmeier over 6 years ago

  • Subject changed from [sle][functional][u] test fails in reboot_gnome - shutdown problem to [sle][functional][u][s390x] test fails in reboot_gnome - shutdown problem
  • Estimated time set to 5.00 h
Actions #14

Updated by oorlov over 6 years ago

  • Assignee set to oorlov
Actions #15

Updated by oorlov over 6 years ago

  • Status changed from Workable to In Progress

I wanted to collect logs from the shutdown locally, by setting up 'DEBUG_SHUTDOWN' parameter (the PR is not merged yet, so I cannot do that on production).

I've excluded 'consoletest_finish' module (which blocks testing of the reboot_gnome module) from local job, but test fails on 'desktop_runner'module and all further modules become skipped.

So, desktop could not be reached. The test is blocked by https://progress.opensuse.org/issues/36268. Marking the ticked as blocked.

Also, according to mgriessmeier's comment from 100 runs, only 22 passed this stage.

Actions #16

Updated by oorlov over 6 years ago

  • Blocked by action #36268: [sle][functional][u][hard][s390x] test fails in consoletest_finish - screen does not show desktop session added
Actions #17

Updated by oorlov over 6 years ago

  • Status changed from In Progress to Blocked
Actions #18

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-07-31 to 2018-08-14
Actions #19

Updated by okurz over 6 years ago

Also related is https://bugzilla.suse.com/show_bug.cgi?id=1085181 about "openQA test fails in reboot_gnome - stuck in evaluating the password"

Actions #20

Updated by oorlov over 6 years ago

It is not just related. It is the same issue.

@okurz, Is it not the same case as you described here ?

Why we are keeping both progress and bugzilla tickets in that case?

Actions #21

Updated by okurz over 6 years ago

  • Due date changed from 2018-08-14 to 2018-08-28

bulk move to next sprint as could not be discussed in SR

Actions #22

Updated by okurz over 6 years ago

  • Subject changed from [sle][functional][u][s390x] test fails in reboot_gnome - shutdown problem to [sle][functional][u][s390x] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method
  • Status changed from Blocked to Workable

yes, the reported issue is the same as the bug. I changed the subject line to note that we can help with investigation on the bug.

Actions #23

Updated by oorlov over 6 years ago

I've investigated the issue with the DEBUG_SHUTDOWN variable and 6000 timeout.

The test stuck after pressing 'Authenticate' button while entering the password on reboot.

So, it seems like the reboot itself is not called, so /usr/lib/systemd/system-shutdown/debug.sh is not executed and there are no logs in the serial console.

Actions #24

Updated by oorlov over 6 years ago

So, I've investigated the issue more and find out, that the test fails due to infrastructure/tests design. It is not a product issue.

The following line in bootloader_zkvm.pm:

$svirt->change_domain_element(on_reboot => 'destroy');

was removed from:

71        if (get_var('ZDUP') or get_var('ONLINE_MIGRATION') or get_var('BOOT_HDD_IMAGE') or get_var('AUTOYAST')) {
72            $svirt->change_domain_element(on_reboot => undef);
            $svirt->change_domain_element(on_reboot => 'destroy');
73        }

That makes the vm to not change the status from 'running' to 'shut off' while rebooting, but in os-autoinst/backend/svirt.pm test checks for 'shut off' state

143        $rsp = $self->run_cmd("! virsh dominfo $vmname | grep -w 'shut off'");

I've changed that, but I've got: error: internal error: qemu unexpectedly closed the monitor

[2018-08-15T16:31:25.0836 CEST] [debug] Command executed: virsh  start openQA-SUT-7
[2018-08-15T16:31:26.0113 CEST] [debug] GET "/UfM8yXmYHaqLB1YY/isotovideo/status" (6ec93a98)
[2018-08-15T16:31:26.0114 CEST] [debug] Routing to a callback
[2018-08-15T16:31:26.0114 CEST] [debug] 200 OK (0.000897s, 1114.827/s)
[2018-08-15T16:31:26.0200 CEST] [debug] Command's stderr:
error: Failed to start domain openQA-SUT-7
error: internal error: qemu unexpectedly closed the monitor

  • $ sudo virsh -c qemu:///system version --daemon
Compiled against library: libvirt 3.3.0
Using library: libvirt 3.3.0
Using API: QEMU 3.3.0
Running hypervisor: QEMU 2.9.1
Running against daemon: 3.3.0
  • vim /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"

I guess, that the QEMU error might be because of out-of dated version of QEMU itself (e.g. see https://bugzilla.redhat.com/show_bug.cgi?id=1571546)

Actions #25

Updated by oorlov over 6 years ago

  • Status changed from Workable to In Progress
Actions #26

Updated by SLindoMansilla over 6 years ago

  • Description updated (diff)
Actions #27

Updated by SLindoMansilla over 6 years ago

  • Subject changed from [sle][functional][u][s390x] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method to [sle][functional][u][s390x][medium] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method
  • Difficulty set to medium
Actions #28

Updated by oorlov over 6 years ago

  • Status changed from In Progress to Feedback

An appropriate PR is opened: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5597

I've closed bsc#1085181 as RESOLVED INVALID.

Actions #29

Updated by oorlov over 6 years ago

Still cannot resolve the ticket as on the last build, reboot_gnome module is not reached due to fails in the modules before it. So, waiting for a passed one.

Actions #30

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-08-28 to 2018-09-11
Actions #31

Updated by mgriessmeier over 6 years ago

  • Due date changed from 2018-09-11 to 2018-09-25

let's discuss the state offline

Actions #32

Updated by oorlov over 6 years ago

I've checked the last builds and reboot_gnome fails but by the other reason, on wait_serial. My first assumption is that the current timeout is not enough.

Actions #33

Updated by oorlov over 6 years ago

Currently, I'm not able to test the issue on my local environment due to https://progress.opensuse.org/issues/39503

Actions #34

Updated by oorlov over 6 years ago

  • Blocked by action #39503: svirt tests fail with unsupported update encoding -1733194013 at /.../consoles/VNC.pm line 988 added
Actions #35

Updated by oorlov over 6 years ago

  • Status changed from Feedback to Blocked
Actions #36

Updated by coolo over 6 years ago

  • Blocked by deleted (action #39503: svirt tests fail with unsupported update encoding -1733194013 at /.../consoles/VNC.pm line 988)
Actions #37

Updated by oorlov about 6 years ago

  • Status changed from Blocked to In Progress

As the blocking ticket is resolved and the fix is appeared in 'devel-openQA' repo, I'm continuing working on this.

Actions #38

Updated by oorlov about 6 years ago

  • Status changed from In Progress to Blocked

Unfortunately I still have the issue with encoding.

Actions #39

Updated by okurz about 6 years ago

  • Status changed from Blocked to Workable

Haven't we mentioned that today in the daily meeting? Are you sure you have the package with the fix installed? E.g. check rpm -q --changelog os-autoinst. OTOH coolo has removed the "blocker" ticket and you should not reference this ticket as being blocked. I assume every openQA test ticket needing local verification would be blocked – but only for you or whoelse has these impediments. coolo gave you the right hint in #39503#note-23

Actions #40

Updated by oorlov about 6 years ago

okurz wrote:

Haven't we mentioned that today in the daily meeting?

This is why I wanted to proceed further with the issue. I've updated all the packages including openQA and os-autoinst from devel-openQA repo (tha I've mentioned in 39503#note-22), but os-autoinst does not contain the fix.

coolo has removed the "blocker" ticket

Why it is removed, though the fix is not in the repo yet? My ticket is still blocked by the encoding issue. Will try to use workaround mentioned by coolo.

Actions #41

Updated by SLindoMansilla about 6 years ago

  • Due date changed from 2018-09-25 to 2018-10-09
  • Status changed from Workable to In Progress

Moving to sprint 27.

Actions #42

Updated by okurz about 6 years ago

oorlov wrote:

[…] the fix is not in the repo yet?

It is. See #39503#note-23 mentioning that you used an old version. #39503#note-21 provides more details. Are you sure you use the fix from devel:openQA? If you use the package from Tumbleweed oss repos you need to wait for the openQA-in-openQA tests to be fixed or provide help yourself: #41465

Actions #43

Updated by okurz about 6 years ago

  • Target version changed from Milestone 18 to Milestone 19
Actions #44

Updated by okurz about 6 years ago

  • Description updated (diff)
  • Category changed from Bugs in existing tests to New test

As we found out that scenario really never worked properly so it is actually "New test" and we confused ourselves for the whole time!!!1 As discussed with the test suite maintainer riafarov we set INSTALLONLY=1 on the test suite.

@oorlov please schedule manually verification runs with that setting on osd as we expect we do not need any more code changes.

Actions #45

Updated by oorlov about 6 years ago

Executed: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=_poo36279_install_only&groupid=132
5 of 5 passed.

@okurz, should we close that ticket? Or we'll continue working on it as on 'New Test'?

Actions #46

Updated by okurz about 6 years ago

oorlov wrote:

Executed: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=_poo36279_install_only&groupid=132
5 of 5 passed.

Well, please check also other scenarios for the same test suite as we changed a setting in the test suite, e.g. SLE15 as well, x86_64, as well. By now I guess new builds including these scenarios have been triggered, at least for SLE15, already.

@okurz, should we close that ticket? Or we'll continue working on it as on 'New Test'?

After you checked the above set it to "Resolved". It is acceptable to have reduced test coverage in this case.

Actions #47

Updated by okurz about 6 years ago

  • Status changed from In Progress to Resolved

15-SP1 tests currently fail in earlier step labeled with another ticket, 12-SP4 is fine in all four archs: https://openqa.suse.de/tests/overview?distri=sle&flavor=Server-DVD&version=12-SP4&test=lvm-encrypt-separate-boot e.g. x86_64: https://openqa.suse.de/tests/2124943

Actions

Also available in: Atom PDF