Project

General

Profile

Actions

action #155758

closed

coordination #151816: [epic] Handle openQA fixes and job group setup

[sporadic] Reboot takes too long for ppc64le

Added by syrianidou_sofia 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
-
Start date:
2024-02-21
Due date:
% Done:

0%

Estimated time:

Description

For some ppc tests, we experience a sporadic reboot failure on different points of the test. As the failure doesn't happen every time on the same point, it could be an infrastructure issue that might be fixed by increasing the timeout.

Failure examples:
https://openqa.suse.de/tests/13549614#next_previous
https://openqa.suse.de/tests/13554127#next_previous

Acceptance criteria

AC1: Check if the ppc failures can go away by increasing the reboot timeout for ppc64le.

Additional information

  • In case the issue persists, check if it could be a bug.
  • In case there are no indications of a bug, ask tools team for more information note: not sure how this two suggestions above could go, as power kvm is not officially supported and tools squad is not taking care of exotic architecture. Most likely if unstable it is better to consider disabling the scenario, we'll see...
Actions #1

Updated by syrianidou_sofia 2 months ago

  • Project changed from openQA Tests to qe-yam
  • Category deleted (Bugs in existing tests)
Actions #2

Updated by JERiveraMoya 2 months ago

  • Tags set to qe-yam-feb-sprint
  • Description updated (diff)
  • Status changed from New to Workable
  • Parent task set to #151816
Actions #3

Updated by leli 2 months ago

  • Status changed from Workable to In Progress
  • Assignee set to leli
Actions #4

Updated by leli 2 months ago · Edited

From the failed job serail0.txt:

[ 44.022735][ T3924] block vda: the capability attribute has been deprecated.
[ 46.704481][ T5157] NET: Registered PF_ALG protocol family
[ 51.041195] wickedd-dhcp6[1094]: eth0: DHCPv6 is disabled by IPv6 router RA
[ 52.480523] wickedd-dhcp6[1094]: eth0: DHCPv6 is disabled by IPv6 router RA
[ 52.960305] wickedd-dhcp6[1094]: eth0: DHCPv6 is disabled by IPv6 router RA
[ 56.211725] load.sh[2018]: Starting kdump kernel load; kexec cmdline: /sbin/kexec -p /var/lib/kdump/kernel --append=" plymouth.ignore-serial-consoles console=hvc0 console=tty fadump= mitigations=auto sysrq=yes reset_devices acpi_no_memhotplug cgroup_disable=memory nokaslr numa=off irqpoll maxcpus=1 root=kdump rootflags=bind rd.udev.children-max=8 panic=1" --initrd=/var/lib/kdump/initrd -a
[ 56.242270][ T5413] memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=5413 'kexec'
[ 56.980248] load.sh[2018]: Loaded kdump kernel.

It seems disk has some issue and kdump loaded which may caused the reboot very slow. I will run the test with the new support image to check whether the issue can be reproduced.

Actions #5

Updated by JERiveraMoya about 2 months ago

  • Tags changed from qe-yam-feb-sprint to qe-yam-mar-sprint
Actions #6

Updated by JERiveraMoya about 2 months ago

did you find any remediation ?
Still seen in https://openqa.suse.de/tests/13636406#step/first_boot/4

Actions #7

Updated by leli about 2 months ago

JERiveraMoya wrote in #note-6:

did you find any remediation ?
Still seen in https://openqa.suse.de/tests/13636406#step/first_boot/4

After some investigation on it, I think it is a gnome bug, after migration during the reboot process something wrong for gdm and caused system hang. I filed a bug for it, https://bugzilla.suse.com/show_bug.cgi?id=1220723

Actions #8

Updated by JERiveraMoya about 2 months ago · Edited

leli wrote in #note-7:

JERiveraMoya wrote in #note-6:

did you find any remediation ?
Still seen in https://openqa.suse.de/tests/13636406#step/first_boot/4

After some investigation on it, I think it is a gnome bug, after migration during the reboot process something wrong for gdm and caused system hang. I filed a bug for it, https://bugzilla.suse.com/show_bug.cgi?id=1220723

developer is waiting for feedback: https://bugzilla.suse.com/show_bug.cgi?id=1220723#c7

Actions #9

Updated by leli about 2 months ago

JERiveraMoya wrote in #note-8:

leli wrote in #note-7:

JERiveraMoya wrote in #note-6:

did you find any remediation ?
Still seen in https://openqa.suse.de/tests/13636406#step/first_boot/4

After some investigation on it, I think it is a gnome bug, after migration during the reboot process something wrong for gdm and caused system hang. I filed a bug for it, https://bugzilla.suse.com/show_bug.cgi?id=1220723

developer is waiting for feedback: https://bugzilla.suse.com/show_bug.cgi?id=1220723#c7

It is very hard to provide logs when issue happened since system is hang and mixed root login issue. As we discussed in stand-up meeting, we will create a new ticket to run the test on textmode or reduce some modules to make the test work.

Actions #10

Updated by leli about 2 months ago

Created a new ticket to change test to textmode, https://progress.opensuse.org/issues/156970

Actions #11

Updated by JERiveraMoya about 2 months ago

  • Status changed from In Progress to Resolved

Thanks for the investigation, feel free to pick the follow-up ticket or leave it for anyone else. Let's resolve this one.

Actions

Also available in: Atom PDF