action #36279
closed[sle][functional][u][s390x][medium] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method
Added by zluo over 6 years ago. Updated about 6 years ago.
0%
Description
Observation¶
openQA test in scenario sle-12-SP4-Server-DVD-s390x-lvm-encrypt-separate-boot@zkvm fails in
reboot_gnome
found in autoinst log:
...
[2018-05-16T05:22:47.0172 CEST] [debug] Command executed: ! virsh dominfo openQA-SUT-2 | grep -w 'shut off', ret=0
[2018-05-16T05:22:48.0265 CEST] [debug] Connection to root@s390pb.suse.de established
[2018-05-16T05:22:48.0445 CEST] [debug] Command executed: ! virsh dominfo openQA-SUT-2 | grep -w 'shut off', ret=0
[2018-05-16T05:22:49.0565 CEST] [debug] # Test died: Machine didn't shut down! at /var/lib/openqa/cache/tests/sle/lib/utils.pm line 486.
--
Actually we don't know on openQA side what exactly happened with shutdown on zkvm host. For me this is similar issue on pkvm with reboot or shutdown.
It takes a long time or it hangs forever.
We can try to give more time for reboot/shutdown at least. Or try to check host status.
Acceptance criteria¶
- AC1: Above mentioned scenario on zkvm or suse-kvm shuts down in a stable way
Reproducible¶
Fails since (at least) Build 0232
Expected result¶
Reference: SLE 12 SP3 GM
Further details¶
Always latest result in this scenario: latest , latest zkvm (outdated)
Updated by okurz over 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default
https://openqa.suse.de/tests/1706798
Updated by okurz over 6 years ago
- Related to action #36150: [opensuse][functional][u][sporadic] test fails in shutdown on unsafe code looking for "sddm_shutdown_option_btn" added
Updated by okurz over 6 years ago
- Description updated (diff)
- Due date set to 2018-07-31
- Status changed from New to Workable
- Target version changed from Milestone 17 to Milestone 18
Updated by okurz over 6 years ago
- Target version changed from Milestone 18 to Milestone 18
Updated by okurz over 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: default
https://openqa.suse.de/tests/1795627
Updated by okurz over 6 years ago
This is an autogenerated message for openQA integration by the openqa_review script:
This bug is still referenced in a failing openQA test: lvm-full-encrypt
https://openqa.suse.de/tests/1825212
Updated by okurz over 6 years ago
recent failure in sle-12-SP4-Server-DVD-s390x-Build0286-lvm-encrypt-separate-boot@s390x-kvm-sle12 -> https://openqa.suse.de/tests/1825231#step/reboot_gnome/22
Updated by mgriessmeier over 6 years ago
- Subject changed from [sle][functional][u] test fails in reboot_gnome - shutdown problem to [sle][functional][u][s390x] test fails in reboot_gnome - shutdown problem
- Estimated time set to 5.00 h
Updated by oorlov over 6 years ago
- Status changed from Workable to In Progress
I wanted to collect logs from the shutdown locally, by setting up 'DEBUG_SHUTDOWN' parameter (the PR is not merged yet, so I cannot do that on production).
I've excluded 'consoletest_finish' module (which blocks testing of the reboot_gnome module) from local job, but test fails on 'desktop_runner'module and all further modules become skipped.
So, desktop could not be reached. The test is blocked by https://progress.opensuse.org/issues/36268. Marking the ticked as blocked.
Also, according to mgriessmeier's comment from 100 runs, only 22 passed this stage.
Updated by oorlov over 6 years ago
- Blocked by action #36268: [sle][functional][u][hard][s390x] test fails in consoletest_finish - screen does not show desktop session added
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-07-31 to 2018-08-14
Updated by okurz over 6 years ago
Also related is https://bugzilla.suse.com/show_bug.cgi?id=1085181 about "openQA test fails in reboot_gnome - stuck in evaluating the password"
Updated by okurz over 6 years ago
- Due date changed from 2018-08-14 to 2018-08-28
bulk move to next sprint as could not be discussed in SR
Updated by okurz over 6 years ago
- Subject changed from [sle][functional][u][s390x] test fails in reboot_gnome - shutdown problem to [sle][functional][u][s390x] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method
- Status changed from Blocked to Workable
yes, the reported issue is the same as the bug. I changed the subject line to note that we can help with investigation on the bug.
Updated by oorlov over 6 years ago
I've investigated the issue with the DEBUG_SHUTDOWN variable and 6000 timeout.
The test stuck after pressing 'Authenticate' button while entering the password on reboot.
So, it seems like the reboot itself is not called, so /usr/lib/systemd/system-shutdown/debug.sh is not executed and there are no logs in the serial console.
Updated by oorlov over 6 years ago
So, I've investigated the issue more and find out, that the test fails due to infrastructure/tests design. It is not a product issue.
The following line in bootloader_zkvm.pm:
$svirt->change_domain_element(on_reboot => 'destroy');
was removed from:
71 if (get_var('ZDUP') or get_var('ONLINE_MIGRATION') or get_var('BOOT_HDD_IMAGE') or get_var('AUTOYAST')) {
72 $svirt->change_domain_element(on_reboot => undef);
$svirt->change_domain_element(on_reboot => 'destroy');
73 }
That makes the vm to not change the status from 'running' to 'shut off' while rebooting, but in os-autoinst/backend/svirt.pm test checks for 'shut off' state
143 $rsp = $self->run_cmd("! virsh dominfo $vmname | grep -w 'shut off'");
I've changed that, but I've got: error: internal error: qemu unexpectedly closed the monitor
[2018-08-15T16:31:25.0836 CEST] [debug] Command executed: virsh start openQA-SUT-7
[2018-08-15T16:31:26.0113 CEST] [debug] GET "/UfM8yXmYHaqLB1YY/isotovideo/status" (6ec93a98)
[2018-08-15T16:31:26.0114 CEST] [debug] Routing to a callback
[2018-08-15T16:31:26.0114 CEST] [debug] 200 OK (0.000897s, 1114.827/s)
[2018-08-15T16:31:26.0200 CEST] [debug] Command's stderr:
error: Failed to start domain openQA-SUT-7
error: internal error: qemu unexpectedly closed the monitor
- $ sudo virsh -c qemu:///system version --daemon
Compiled against library: libvirt 3.3.0
Using library: libvirt 3.3.0
Using API: QEMU 3.3.0
Running hypervisor: QEMU 2.9.1
Running against daemon: 3.3.0
- vim /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
I guess, that the QEMU error might be because of out-of dated version of QEMU itself (e.g. see https://bugzilla.redhat.com/show_bug.cgi?id=1571546)
Updated by SLindoMansilla over 6 years ago
- Subject changed from [sle][functional][u][s390x] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method to [sle][functional][u][s390x][medium] test fails in reboot_gnome on bsc#1085181 - help with investigation using new shutdown debug method
- Difficulty set to medium
Updated by oorlov over 6 years ago
- Status changed from In Progress to Feedback
An appropriate PR is opened: https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/5597
I've closed bsc#1085181 as RESOLVED INVALID.
Updated by oorlov over 6 years ago
Still cannot resolve the ticket as on the last build, reboot_gnome module is not reached due to fails in the modules before it. So, waiting for a passed one.
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-08-28 to 2018-09-11
Updated by mgriessmeier over 6 years ago
- Due date changed from 2018-09-11 to 2018-09-25
let's discuss the state offline
Updated by oorlov over 6 years ago
I've checked the last builds and reboot_gnome fails but by the other reason, on wait_serial. My first assumption is that the current timeout is not enough.
Updated by oorlov over 6 years ago
Currently, I'm not able to test the issue on my local environment due to https://progress.opensuse.org/issues/39503
Updated by oorlov over 6 years ago
- Blocked by action #39503: svirt tests fail with unsupported update encoding -1733194013 at /.../consoles/VNC.pm line 988 added
Updated by coolo over 6 years ago
- Blocked by deleted (action #39503: svirt tests fail with unsupported update encoding -1733194013 at /.../consoles/VNC.pm line 988)
Updated by oorlov about 6 years ago
- Status changed from Blocked to In Progress
As the blocking ticket is resolved and the fix is appeared in 'devel-openQA' repo, I'm continuing working on this.
Updated by oorlov about 6 years ago
- Status changed from In Progress to Blocked
Unfortunately I still have the issue with encoding.
Updated by okurz about 6 years ago
- Status changed from Blocked to Workable
Haven't we mentioned that today in the daily meeting? Are you sure you have the package with the fix installed? E.g. check rpm -q --changelog os-autoinst
. OTOH coolo has removed the "blocker" ticket and you should not reference this ticket as being blocked. I assume every openQA test ticket needing local verification would be blocked – but only for you or whoelse has these impediments. coolo gave you the right hint in #39503#note-23
Updated by oorlov about 6 years ago
okurz wrote:
Haven't we mentioned that today in the daily meeting?
This is why I wanted to proceed further with the issue. I've updated all the packages including openQA and os-autoinst from devel-openQA
repo (tha I've mentioned in 39503#note-22), but os-autoinst does not contain the fix.
coolo has removed the "blocker" ticket
Why it is removed, though the fix is not in the repo yet? My ticket is still blocked by the encoding issue. Will try to use workaround mentioned by coolo.
Updated by SLindoMansilla about 6 years ago
- Due date changed from 2018-09-25 to 2018-10-09
- Status changed from Workable to In Progress
Moving to sprint 27.
Updated by okurz about 6 years ago
oorlov wrote:
[…] the fix is not in the repo yet?
It is. See #39503#note-23 mentioning that you used an old version. #39503#note-21 provides more details. Are you sure you use the fix from devel:openQA? If you use the package from Tumbleweed oss repos you need to wait for the openQA-in-openQA tests to be fixed or provide help yourself: #41465
Updated by okurz about 6 years ago
- Target version changed from Milestone 18 to Milestone 19
Updated by okurz about 6 years ago
- Description updated (diff)
- Category changed from Bugs in existing tests to New test
As we found out that scenario really never worked properly so it is actually "New test" and we confused ourselves for the whole time!!!1 As discussed with the test suite maintainer riafarov we set INSTALLONLY=1
on the test suite.
@oorlov please schedule manually verification runs with that setting on osd as we expect we do not need any more code changes.
Updated by oorlov about 6 years ago
Executed: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=_poo36279_install_only&groupid=132
5 of 5 passed.
@okurz, should we close that ticket? Or we'll continue working on it as on 'New Test'?
Updated by okurz about 6 years ago
oorlov wrote:
Executed: https://openqa.suse.de/tests/overview?distri=sle&version=12-SP4&build=_poo36279_install_only&groupid=132
5 of 5 passed.
Well, please check also other scenarios for the same test suite as we changed a setting in the test suite, e.g. SLE15 as well, x86_64, as well. By now I guess new builds including these scenarios have been triggered, at least for SLE15, already.
@okurz, should we close that ticket? Or we'll continue working on it as on 'New Test'?
After you checked the above set it to "Resolved". It is acceptable to have reduced test coverage in this case.
Updated by okurz about 6 years ago
- Status changed from In Progress to Resolved
15-SP1 tests currently fail in earlier step labeled with another ticket, 12-SP4 is fine in all four archs: https://openqa.suse.de/tests/overview?distri=sle&flavor=Server-DVD&version=12-SP4&test=lvm-encrypt-separate-boot e.g. x86_64: https://openqa.suse.de/tests/2124943