Project

General

Profile

Actions

action #25638

closed

[sles][functional][s390x] test fails in shutdown: VNC stall detected, needs to be investigated

Added by mgriessmeier over 6 years ago. Updated over 6 years ago.

Status:
Resolved
Priority:
Urgent
Assignee:
Category:
Bugs in existing tests
Start date:
2017-09-28
Due date:
2017-10-25
% Done:

0%

Estimated time:
Difficulty:

Description

Observation

openQA test in scenario sle-15-Leanos-DVD-s390x-create_hdd_gnome_s390x@s390x-kvm fails in
shutdown (originally: shutdown )

shows similar symptoms as #20022 at least in regards of VNC stalls

Reproducible

Fails since (at least) Build 278.1

Steps to reproduce

  • clone test "create_hdd_textmode@s390x-kvm"

Expected result

We never executed shutdown successfully in any s390x tests but we try it now.

Further details

Always latest result in this scenario: latest (old), now latest


Related issues 6 (0 open6 closed)

Related to openQA Tests - action #20022: [sle][functional][zkvm][s390] incomplete test due to socket does not exist. Probably your backend instance could not start or diedResolvedmgriessmeier2017-06-232017-10-25

Actions
Related to openQA Tests - action #18936: [tools][sles][functional] Enable 3 stress acceptance on s390xResolvedriafarov2017-05-042017-11-08

Actions
Related to openQA Tests - action #13216: [sles][functional][s390x] Run extratest on s390xResolvedriafarov2017-03-02

Actions
Related to openQA Project - action #26886: [tools][s390x-kvm] investigate and improve 'assert_shutdown' function in testapiResolvedokurz2017-10-19

Actions
Blocks openQA Tests - action #23406: [sle][functional]Use single test suite for create_hdd_gnome on all architectures (and downstream jobs)Resolvedriafarov2017-08-162017-10-25

Actions
Copied to openQA Tests - action #26914: [sle][functional][s390x][s390x-kvm] s390x-kvm does never exit from "assert_shutdown" but zkvm works -> investigate how the machines differ, maybe problem on s390p8?Resolvedokurz2017-09-28

Actions
Actions #1

Updated by mgriessmeier over 6 years ago

  • Related to action #20022: [sle][functional][zkvm][s390] incomplete test due to socket does not exist. Probably your backend instance could not start or died added
Actions #2

Updated by mgriessmeier over 6 years ago

  • Blocks action #23406: [sle][functional]Use single test suite for create_hdd_gnome on all architectures (and downstream jobs) added
Actions #3

Updated by mgriessmeier over 6 years ago

  • Due date set to 2017-10-11
Actions #4

Updated by riafarov over 6 years ago

VNC stalls in other test suites too:
https://openqa.suse.de/tests/1193329
https://openqa.suse.de/tests/1193329

EDIT (okurz): These two should be ignored for the case of this ticket, they do not fail in shutdown

Actions #5

Updated by mgriessmeier over 6 years ago

not worked on that one in particular but https://github.com/os-autoinst/os-autoinst/pull/862 might also help here if anyone wants to investigate further

Actions #6

Updated by riafarov over 6 years ago

PR with fix of review comment to be able to merge it: https://github.com/os-autoinst/os-autoinst/pull/864

Actions #7

Updated by okurz over 6 years ago

  • Description updated (diff)
Actions #8

Updated by okurz over 6 years ago

did not complete in sprint 1. main reason: spontaneous packaging training which we were not aware of in before. we have the PR which should improve user feedback a lot and this is definitely possible in the next sprint 2.

Actions #9

Updated by okurz over 6 years ago

  • Due date changed from 2017-10-11 to 2017-10-25
Actions #10

Updated by okurz over 6 years ago

  • Target version set to Milestone 11
Actions #11

Updated by okurz over 6 years ago

  • Assignee deleted (mgriessmeier)

mgriessmeier in vacation, unassigning for now.

Actions #12

Updated by zluo over 6 years ago

  • Assignee set to zluo

will try to fix this issue for now.

Actions #13

Updated by zluo over 6 years ago

shutdown reached target but it stops for further tests:

http://e13.suse.de/tests/4411

fixed however issue in shutdown.pm for textmode:

# s390x on SLE15 does not have a X11/VNC server
if (is_sle && sle_version_at_least('15') && check_var('ARCH', 's390x')) {
power_action('poweroff', textmode => 1);
}
power_action('poweroff');

Actions #14

Updated by riafarov over 6 years ago

  • Related to action #18936: [tools][sles][functional] Enable 3 stress acceptance on s390x added
Actions #15

Updated by riafarov over 6 years ago

  • Related to action #13216: [sles][functional][s390x] Run extratest on s390x added
Actions #16

Updated by zluo over 6 years ago

the problem is that after shutdown the worker still keeps X11 session alive. We need to find a way to terminate it.
Will discuss next week with @okurz and others.

Actions #17

Updated by zluo over 6 years ago

blocked at moment because of no idea to handle with running x11 session and s390-kvm is not available now...

Actions #18

Updated by okurz over 6 years ago

  • Status changed from New to In Progress

the ticket is not "new" anymore -> "in progress". Please discuss with mgriessmeier how to make sure we do not conflict each other with instances.

Actions #19

Updated by zluo over 6 years ago

atm I cannot work on this ticket because s390x-kvm is not ready.
I got an idea to workaround this issue with running x11 session:
create a needle and select_console and return

Actions #20

Updated by mgriessmeier over 6 years ago

  • Subject changed from [sles][functional][s390x] test fails in shutdown: VNC stall detected, needs to be investigated to [sles][functional][s390x][s390x-kvm] test fails in shutdown: VNC stall detected, needs to be investigated
  • Status changed from In Progress to Rejected
  • Assignee changed from zluo to mgriessmeier

We don't know if we ever saw this on production - linked job urls are a different problem

Actions #21

Updated by riafarov over 6 years ago

  • Status changed from Rejected to New

How failed in production on x-kvm. See https://openqa.suse.de/tests/1223349#

Actions #22

Updated by okurz over 6 years ago

  • Priority changed from Normal to Urgent

Fails in many more jobs in build 305.1

Actions #23

Updated by mgriessmeier over 6 years ago

  • Status changed from New to In Progress
Actions #24

Updated by mgriessmeier over 6 years ago

  • Subject changed from [sles][functional][s390x][s390x-kvm] test fails in shutdown: VNC stall detected, needs to be investigated to [sles][functional][s390x] test fails in shutdown: VNC stall detected, needs to be investigated
  • Status changed from In Progress to New
Actions #25

Updated by mgriessmeier over 6 years ago

  • Status changed from New to In Progress
Actions #26

Updated by mgriessmeier over 6 years ago

Was investigating this with riafarov.

We came to the conclusion that it's a weird backend behaviour in assert_shutdown in backend/testapi.pm
and therefore created a ticket for the tools team as a blocker for this: https://progress.opensuse.org/issues/26886

07:51:42.7571 4947 <<< testapi::type_string(string='poweroff
', max_interval=250, wait_screen_changes=0, wait_still_screen=0)
[  OK  ] Stopped target Timers.
[  OK  ] Stopped Daily Cleanup of Temporary Directories.
[  OK  ] Stopped Early Kernel Boot Messages.
[  OK  ] Stopped target Multi-User System.
         Stopping OpenSSH Daemon...
[  OK  ] Stopped target Network is Online.
         Stopping Command Scheduler...
         Stopping Session 1 of user root.
[  OK  ] Removed slice system-systemd\x2dhibernate\x2dresume.slice.
         Stopping Load kdump kernel and initrd...
         Stopping User Manager for UID 0...
07:51:42.9778 Debug: /var/lib/openqa/cache/tests/sle/tests/shutdown/shutdown.pm:32 called utils::power_action
[  OK  ] Removed slice system-getty.slice.
07:51:42.9780 4947 <<< testapi::assert_shutdown(timeout=60)
[  OK  ] Stopped /etc/init.d/after.local Compatibility.
07:51:43.0602 4949 Connection to root@s390p8.suse.de established
07:51:43.1630 4949 Command executed: ! virsh dominfo openQA-SUT-2 | grep -w 'shut off', ret=0
[  OK  ] Stopped target Login Prompts.
[  OK  ] Stopped Discard unused blocks once a week.
         Stopping System Logging Service...
         Stopping Restore /run/initramfs on shutdown...
         Stopping Load kdump kernel early on startup...
         Stopping Serial Getty on ttysclp0...
[  OK  ] Stopped System Logging Service.
[  OK  ] Stopped Serial Getty on ttysclp0.
[  OK  ] Stopped OpenSSH Daemon.
[  OK  ] Stopped Command Scheduler.
         Stopping Postfix Mail Transport Agent...
[  OK  ] Stopped /etc/init.d/boot.local Compatibility.
[  OK  ] Removed slice system-serial\x2dgetty.slice.
[  OK  ] Stopped Session 1 of user root.
[  OK  ] Stopped Restore /run/initramfs on shutdown.
[  OK  ] Stopped Load kdump kernel early on startup.
[  OK  ] Stopped Load kdump kernel and initrd.
[  OK  ] Stopped Postfix Mail Transport Agent.
[  OK  ] Stopped target Host and Network Name Lookups.
[  OK  ] Stopped User Manager for UID 0.
[  OK  ] Removed slice User Slice of root.
         Stopping Login Service...
         Stopping Permit User Sessions...
[  OK  ] Stopped Login Service.
[  OK  ] Stopped Permit User Sessions.
[  OK  ] Stopped target User and Group Name Lookups.
         Stopping Name Service Cache Daemon...
[  OK  ] Stopped target Network.
         Stopping wicked managed network interfaces...
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Remote File Systems (Pre).
[  OK  ] Stopped Name Service Cache Daemon.
[  OK  ] Stopped wicked managed network interfaces.
         Stopping wicked network nanny service...
[  OK  ] Stopped wicked network nanny service.
         Stopping wicked network management service daemon...
[  OK  ] Stopped wicked network management service daemon.
         Stopping wicked DHCPv4 supplicant service...
         Stopping wicked AutoIPv4 supplicant service...
         Stopping wicked DHCPv6 supplicant service...
[  OK  ] Stopped wicked DHCPv4 supplicant service.
[  OK  ] Stopped wicked DHCPv6 supplicant service.
[  OK  ] Stopped wicked AutoIPv4 supplicant service.
         Stopping D-Bus System Message Bus...
[  OK  ] Stopped D-Bus System Message Bus.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Sockets.
[  OK  ] Closed Syslog Socket.
[  OK  ] Stopped target Slices.
[  OK  ] Removed slice User and Session Slice.
[  OK  ] Stopped target Paths.
[  OK  ] Closed D-Bus System Message Bus Socket.
[  OK  ] Stopped target System Initialization.
[  OK  ] Stopped Update is Completed.
[  OK  ] Stopped target Swap.
         Deactivating swap /dev/disk/by-uuid…7d3-20c2-47b0-a965-339f4851c325...
[  OK  ] Stopped target Encrypted Volumes.
[  OK  ] Stopped Dispatch Password Requests to Console Directory Watch.
[  OK  ] Stopped Rebuild Journal Catalog.
[  OK  ] Stopped Apply Kernel Variables.
         Stopping Load/Save Random Seed...
[  OK  ] Stopped Rebuild Hardware Database.
[  OK  ] Stopped Commit a transient machine-id on disk.
         Stopping Update UTMP about System Boot/Shutdown...
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Stopped Update UTMP about System Boot/Shutdown.
[  OK  ] Stopped Create Volatile Files and Directories.
[  OK  ] Stopped Flush Journal to Persistent Storage.
[  OK  ] Stopped target Local File Systems.
         Unmounting /var/lib/mariadb...
         Unmounting /var/lib/pgsql...
         Unmounting /var/crash...
         Unmounting /boot/grub2/s390x-emu...
         Unmounting /opt...
         Unmounting /var/lib/libvirt/images...
         Unmounting /var/cache...
         Unmounting /var/lib/named...
         Unmounting /run/user/0...
         Unmounting /.snapshots...
         Unmounting /boot/zipl...
         Unmounting /usr/local...
         Unmounting /var/opt...
         Unmounting /var/tmp...
         Unmounting /srv...
         Unmounting /var/lib/mysql...
         Unmounting /tmp...
         Unmounting /var/lib/machines...
         Unmounting /var/spool...
         Unmounting /var/lib/mailman...
         Unmounting /var/log...
[  OK  ] Stopped Load/Save Random Seed.
[  OK  ] Unmounted /usr/local.
[  OK  ] Unmounted /var/tmp.
[  OK  ] Unmounted /tmp.
[  OK  ] Unmounted /var/lib/mailman.
[  OK  ] Deactivated swap /dev/disk/by-path/ccw-0.0.0000-part3.
[  OK  ] Deactivated swap /dev/disk/by-partu…6a711-92d2-4a69-8bab-84e4628c909e.
[  OK  ] Deactivated swap /dev/vda3.
[  OK  ] Deactivated swap /dev/disk/by-uuid/…027d3-20c2-47b0-a965-339f4851c325.
[  OK  ] Unmounted /var/log.
[  OK  ] Unmounted /.snapshots.
[  OK  ] Unmounted /var/cache.
[  OK  ] Unmounted /var/spool.
[  OK  ] Unmounted /var/lib/pgsql.
[  OK  ] Unmounted /var/crash.
[  OK  ] Unmounted /boot/grub2/s390x-emu.
[  OK  ] Unmounted /opt.
[  OK  ] Unmounted /var/lib/libvirt/images.
[  OK  ] Unmounted /run/user/0.
[  OK  ] Unmounted /var/opt.
[  OK  ] Unmounted /srv.
[  OK  ] Unmounted /var/lib/machines.
[  OK  ] Unmounted /var/lib/named.
[  OK  ] Unmounted /var/lib/mysql.
[  OK  ] Unmounted /var/lib/mariadb.
[  OK  ] Unmounted /boot/zipl.
[  OK  ] Reached target Unmount All Filesystems.
[  OK  ] Stopped target Local File Systems (Pre).
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create System Users.
[  OK  ] Stopped Remount Root and Kernel File Systems.
[  OK  ] Reached target Shutdown.
Actions #27

Updated by mgriessmeier over 6 years ago

  • Blocked by action #26886: [tools][s390x-kvm] investigate and improve 'assert_shutdown' function in testapi added
Actions #28

Updated by mgriessmeier over 6 years ago

  • Description updated (diff)
Actions #29

Updated by okurz over 6 years ago

  • Description updated (diff)

We should keep one thing in mind. Unless I am mistaken we only have successfully executed "shutdown" on zkvm. So we shouldn't hunt for a regression on s390x-kvm and z/VM.
https://openqa.suse.de/tests/overview?distri=sle&version=12-SP3&build=0473&arch=s390x are all SLE 12 SP3 GM s390x tests for reference. IIUC for s390x zVM we do not have an implementation for "is_shutdown" so there should be the message "Backend does not implement is_shutdown - just sleeping". For s390x-kvm that should be svirt calling ! virsh dominfo $vmname | grep -w 'shut off' which we see in autoinst-log.txt

I see an easy way out: We just skip everything that does not work on s390x (s390x-kvm and z/VM).

I tried to reproduce the problem on zVM locally with a simplified test plan by trying to shutdown but failed to show the problem. It just works fine: http://lord.arch/tests/7741/file/autoinst-log.txt

We have seen the problem only on s390x-kvm and now on zVM as well but not on zkvm, correct?

Actions #30

Updated by okurz over 6 years ago

  • Description updated (diff)
Actions #31

Updated by mgriessmeier over 6 years ago

for now we just skip assert_shutdown on s390x-kvm and z/VM:
https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3752

Actions #32

Updated by okurz over 6 years ago

  • Copied to action #26914: [sle][functional][s390x][s390x-kvm] s390x-kvm does never exit from "assert_shutdown" but zkvm works -> investigate how the machines differ, maybe problem on s390p8? added
Actions #33

Updated by okurz over 6 years ago

  • Status changed from In Progress to Feedback

mgriessmeier and me opted for the "easy way out" -> https://github.com/os-autoinst/os-autoinst-distri-opensuse/pull/3752 , merged.

When jobs don't incomplete anymore we should create another ticket to do a more deep investigation why s390x-kvm and zkvm behave different here. I think the z/VM part is covered by #26886 because the backend implementation is basically a sleep only so it has nothing to do with actual execution on z/VM.

Rest handled in #26914

skip_registration s390x-kvm passed shutdown now

Can't close ticket? set to feedback now, something blocking here?

Actions #34

Updated by okurz over 6 years ago

  • Blocked by deleted (action #26886: [tools][s390x-kvm] investigate and improve 'assert_shutdown' function in testapi)
Actions #35

Updated by okurz over 6 years ago

  • Related to action #26886: [tools][s390x-kvm] investigate and improve 'assert_shutdown' function in testapi added
Actions #36

Updated by okurz over 6 years ago

  • Status changed from Feedback to Resolved

Not blocked by #26886 anymore, closing.

Actions

Also available in: Atom PDF