Project

General

Profile

Actions

action #63763

closed

[SLE][Migration][SLE15SP2]test fails in bootloader_zkvm#1 - Backend process died lead job incomplete

Added by coolgw about 4 years ago. Updated about 4 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Category:
Bugs in existing tests
Target version:
-
Start date:
2020-02-24
Due date:
% Done:

0%

Estimated time:
12.00 h
Difficulty:

Description

Observation

openQA test in scenario sle-15-SP2-Migration-from-SLE12-SP5-to-SLE15-SP2-Milestone-s390x-autoupgrade_sles12sp5_scc_all_full@s390x-kvm-sle15 fails in
bootloader_zkvm#1

Test suite description

Reproducible

Fails since (at least) Build 101.1

Expected result

Last good: (unknown) (or more recent)

Further details

Always latest result in this scenario: latest


Files

Screenshot from 2020-03-17 14-57-59.png (277 KB) Screenshot from 2020-03-17 14-57-59.png The last screen of vedio leli, 2020-03-17 07:38
Actions #1

Updated by leli about 4 years ago

  • Estimated time set to 12.00 h
Actions #2

Updated by openqa_review about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: online_sles15_pscc_basesys-srv_def_full_zdup
https://openqa.suse.de/tests/3984024

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #3

Updated by okurz about 4 years ago

This is an autogenerated message for openQA integration by the openqa_review script:

This bug is still referenced in a failing openQA test: online_sles15_pscc_basesys-srv_def_full_zdup
https://openqa.suse.de/tests/3984024

To prevent further reminder comments one of the following options should be followed:

  1. The test scenario is fixed by applying the bug fix to the tested product or the test is adjusted
  2. The openQA job group is moved to "Released"
  3. The label in the openQA scenario is removed
Actions #4

Updated by leli about 4 years ago

  • Assignee set to leli

It can be found the backend process died caused 'Domain openQA-SUT-3 destroyed'.

#######################################################################################
[2020-02-24T08:46:09.590 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Bizarre copy of ARRAY in list assignment at /usr/lib/perl5/vendor_perl/5.26.1/Devel/StackTrace.pm line 61.

[2020-02-24T08:46:09.590 CET] [debug] Closing SSH serial connection with s390p7.suse.de
[2020-02-24T08:46:09.591 CET] [debug] Destroying openQA-SUT-3 virtual machine
[2020-02-24T08:46:09.591 CET] [debug] <<< backend::baseclass::run_ssh_cmd(cmd="virsh destroy openQA-SUT-3", wantarray=0, keep_open=1)
[2020-02-24T08:46:09.591 CET] [debug] <<< backend::baseclass::run_ssh(cmd="virsh destroy openQA-SUT-3", keep_open=1, wantarray=0)
[2020-02-24T08:46:09.591 CET] [debug] <<< backend::baseclass::new_ssh_connection(keep_open=1, blocking=1, wantarray=0)
[2020-02-24T08:46:10.308 CET] [debug] [run_ssh_cmd(virsh destroy openQA-SUT-3)] stdout:
Domain openQA-SUT-3 destroyed

Actions #5

Updated by mkittler about 4 years ago

I don't think that the domain being destroyed is really special to that issue. When I remember correctly, that's just a cleanup mechanism within the svirt backend.

We also see

XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":42945"
      after 1621 requests (1621 known processed) with 0 events remaining.
XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":42285"
      after 1626 requests (1626 known processed) with 0 events remaining.
[2020-03-09T17:32:50.527 CET] [debug] backend process exited: 0

being logged indicating problems with the XVnc server. But might might be just a symptom and not the cause of the issue. It is annoying that there are no Xvnc logs because it is started with -inetd.

Actions #6

Updated by leli about 4 years ago

I think this is the issue to reconnect VNC server after reboot on s390x. I viewed the video, the installation finished but it hanged at the reboot dialog, I think at this time the VNC connection lost.

[2020-03-16T10:15:05.598 CET] [debug] Backend process died, backend errors are reported below in the following lines:
Error connecting to VNC server 10.161.145.30:5901: IO::Socket::INET: connect: Connection timed out
[2020-03-16T10:15:05.598 CET] [debug] Closing SSH serial connection with s390p7.suse.de
[2020-03-16T10:15:05.599 CET] [debug] Destroying openQA-SUT-1 virtual machine
[2020-03-16T10:15:05.599 CET] [debug] <<< backend::baseclass::run_ssh_cmd(cmd="virsh destroy openQA-SUT-1", wantarray=0, keep_open=1)
[2020-03-16T10:15:05.599 CET] [debug] <<< backend::baseclass::run_ssh(cmd="virsh destroy openQA-SUT-1", keep_open=1, wantarray=0)
[2020-03-16T10:15:05.599 CET] [debug] <<< backend::baseclass::new_ssh_connection(blocking=1, keep_open=1, wantarray=0)
[2020-03-16T10:15:05.679 CET] [debug] [run_ssh_cmd(virsh destroy openQA-SUT-1)] stdout:

[2020-03-16T10:15:05.679 CET] [debug] [run_ssh_cmd(virsh destroy openQA-SUT-1)] stderr:
error: Failed to destroy domain openQA-SUT-1
error: Requested operation is not valid: domain is not running

Actions #7

Updated by leli about 4 years ago

  • Status changed from New to Rejected

From autoinstlog, we can see

############################################
[?25hLoading Linux 5.3.18-8-default ...

Loading initial ramdisk ...

Performing 'kexec -la /boot/image-5.3.18-8-default

--initrd=/boot/initrd-5.3.18-8-default

--command-line=root=UUID=12e8d280-40cf-461e-a1b2-79e57e804cb8

hvc_iucv=8 TERM=dumb resume=/dev/disk/by-path/ccw-0.0.0000-part4

crashkernel=163M'

kexec_file_load failed: Function not implemented
[2020-03-12T18:52:30.561 CET] [debug] tests/boot/boot_to_desktop.pm:42 called opensusebasetest::wait_boot -> lib/opensusebasetest.pm:967 called opensusebasetest::reconnect_s390 -> lib/opensusebasetest.pm:684 called utils::type_line_svirt -> lib/utils.pm:142 called testapi::wait_serial
[2020-03-12T18:52:30.562 CET] [debug] <<< testapi::wait_serial(quiet=undef, record_output=undef, expect_not_found=0, no_regex=0, timeout=400, regexp=qr/Welcome to SUSE Linux Enterprise .*(s390x)/u, buffer_size=undef)
[?25lGNU GRUB version 2.04

+----------------------------------------------------------------------------+||||||||||||||||||||||||+----------------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted.

  Press enter to boot the selected OS, `e' to edit the commands       

  before booting or `c' for a command-line.                            *SLES 15-SP2  

###############################################

So it is for bsc#1166550, system hang on grub.

Actions

Also available in: Atom PDF