Project

General

Profile

action #81020

QA-Power8-4-kvm start failed since reboot on 2020-12-13

Added by Xiaojing_liu 7 months ago. Updated 7 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2020-12-14
Due date:
% Done:

0%

Estimated time:

Description

Observation

The host can be ping:

liuxiaojing@linux-90dc:~/Downloads> ping qa-power8-4-kvm.qa.suse.de 
PING QA-Power8-4-kvm.qa.suse.de (10.162.6.201) 56(84) bytes of data.
64 bytes from 10.162.6.201 (10.162.6.201): icmp_seq=1 ttl=62 time=153 ms
64 bytes from 10.162.6.201 (10.162.6.201): icmp_seq=2 ttl=62 time=153 ms

run ipmitool sol activate, it shows
exiting the petitboot. Type 'exit' to rerun
Then I typied exit, this menu showed:

Petitboot (dev.20160310)                     8348-21C         684F75A
 ──────────────────────────────────────────────────────────────────────────────
    sled11-sp2-gm
    sles11-sp1-vmware
    sles10
    sles10-sp1
    sles10-sp2
    sles10-sp3
    sles11
    sles11-sp1
    sles11-sp2-vmware
    sles10-sp4
    sles-12-gm
    harddisk

  System information
 *System configuration                      
  Language
  Rescan devices
  Retrieve config from URL                  
  Exit to shell
 ──────────────────────────────────────────────────────────────────────────────
 Enter=accept, e=edit, n=new, x=exit, l=language, h=help

Seems it starts from the network? I don't know how to choose, and if it should start from network.

Then I run sol deactivate to close the ipmitool connection. (Failed to quit by using ~.)


Related issues

Related to openQA Infrastructure - action #81058: [tracker-ticket] Power machines can't find installed OS. Automatic reboots disabled for nowResolved2020-12-152021-04-16

History

#1 Updated by nicksinger 7 months ago

  • Assignee set to nicksinger

#2 Updated by nicksinger 7 months ago

  • Status changed from New to In Progress

#3 Updated by nicksinger 7 months ago

I can successfully recover the machine with:

kexec -l /var/petitboot/mnt/dev/sdb2/boot/vmlinux-5.3.18-lp152.57-default --initrd=/var/petitboot/mnt/dev/sdb2/boot/initrd-5.3.18-lp152.57-default --command-line="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 crashkernel=210M"
kexec -e

So petitboot can find the disk but refuses to load the bootloader entries from there. We're once again at the point where we would need to understand petitboot.
Petitboot on that machine is from 2016. I couldn't figure out yet how one can update it. Most likely with a "firmware upgrade" from IBM.
I will try to rewrite grub (configs) now (despite nothing changed there according to zypper logs) and see if that helps.

If we can't figure out what is causing all the power machines to lose their bootloader entries we might need to consider to remove auto reboots for now… Currently we're keeping ourself busy with recovering these hosts

#4 Updated by okurz 7 months ago

nicksinger wrote:

If we can't figure out what is causing all the power machines to lose their bootloader entries we might need to consider to remove auto reboots for now… Currently we're keeping ourself busy with recovering these hosts

Agreed. Feel welcome to just call systemctl mask auto-update.service on all affected machines for now (and revert before closing this ticket or any that still mentions that). I could not test.ping the machine nor login over ssh to qa-power8-4-kvm.qa right now to do it myself and did not want to kick anyone out of SoL.

#5 Updated by nicksinger 7 months ago

  • Assignee changed from nicksinger to okurz

okurz wrote:

nicksinger wrote:

If we can't figure out what is causing all the power machines to lose their bootloader entries we might need to consider to remove auto reboots for now… Currently we're keeping ourself busy with recovering these hosts

Agreed. Feel welcome to just call systemctl mask auto-update.service on all affected machines for now (and revert before closing this ticket or any that still mentions that). I could not test.ping the machine nor login over ssh to qa-power8-4-kvm.qa right now to do it myself and did not want to kick anyone out of SoL.

yeah I tried to regenerate the grub config as this helped last time. But since Power8-5 showed exactly the same symptoms (stuck in petitboot, detecting network boot but not the installed OS) I start to think we face a product bug. Unfortunately I can't seem to figure out in what component :/
The auto-update.service seems to be masked already on Power8-4 as well as Power8-5. Isn't rebootmgr the service causing the reboots?

#6 Updated by okurz 7 months ago

  • Assignee changed from okurz to nicksinger
  • Target version set to Ready

nicksinger wrote:

[…]
The auto-update.service seems to be masked already on Power8-4 as well as Power8-5. Isn't rebootmgr the service causing the reboots?

yes, of course. stupid me. Assuming you just assigned to me to answer the question I am assigning back to you. If you want me to do something else then you can assign it back but please tell me then what I should do :)

#7 Updated by nicksinger 7 months ago

  • Related to action #81058: [tracker-ticket] Power machines can't find installed OS. Automatic reboots disabled for now added

#8 Updated by nicksinger 7 months ago

  • Status changed from In Progress to Resolved

Resolving this in favor of the bigger tracking ticket regarding ppc boot problems. Also I think this covers more the immediate actions taken to get the host back up and running. Feel free to reopen if you disagree.

Also available in: Atom PDF