action #81020
closedQA-Power8-4-kvm start failed since reboot on 2020-12-13
0%
Description
Observation¶
The host can be ping:
liuxiaojing@linux-90dc:~/Downloads> ping qa-power8-4-kvm.qa.suse.de
PING QA-Power8-4-kvm.qa.suse.de (10.162.6.201) 56(84) bytes of data.
64 bytes from 10.162.6.201 (10.162.6.201): icmp_seq=1 ttl=62 time=153 ms
64 bytes from 10.162.6.201 (10.162.6.201): icmp_seq=2 ttl=62 time=153 ms
run ipmitool sol activate
, it shows
exiting the petitboot. Type 'exit' to rerun
Then I typied exit
, this menu showed:
Petitboot (dev.20160310) 8348-21C 684F75A
──────────────────────────────────────────────────────────────────────────────
sled11-sp2-gm
sles11-sp1-vmware
sles10
sles10-sp1
sles10-sp2
sles10-sp3
sles11
sles11-sp1
sles11-sp2-vmware
sles10-sp4
sles-12-gm
harddisk
System information
*System configuration
Language
Rescan devices
Retrieve config from URL
Exit to shell
──────────────────────────────────────────────────────────────────────────────
Enter=accept, e=edit, n=new, x=exit, l=language, h=help
Seems it starts from the network? I don't know how to choose, and if it should start from network.
Then I run sol deactivate
to close the ipmitool connection. (Failed to quit by using ~.
)
Updated by nicksinger almost 4 years ago
I can successfully recover the machine with:
kexec -l /var/petitboot/mnt/dev/sdb2/boot/vmlinux-5.3.18-lp152.57-default --initrd=/var/petitboot/mnt/dev/sdb2/boot/initrd-5.3.18-lp152.57-default --command-line="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 crashkernel=210M"
kexec -e
So petitboot can find the disk but refuses to load the bootloader entries from there. We're once again at the point where we would need to understand petitboot.
Petitboot on that machine is from 2016. I couldn't figure out yet how one can update it. Most likely with a "firmware upgrade" from IBM.
I will try to rewrite grub (configs) now (despite nothing changed there according to zypper logs) and see if that helps.
If we can't figure out what is causing all the power machines to lose their bootloader entries we might need to consider to remove auto reboots for now… Currently we're keeping ourself busy with recovering these hosts
Updated by okurz almost 4 years ago
nicksinger wrote:
If we can't figure out what is causing all the power machines to lose their bootloader entries we might need to consider to remove auto reboots for now… Currently we're keeping ourself busy with recovering these hosts
Agreed. Feel welcome to just call systemctl mask auto-update.service
on all affected machines for now (and revert before closing this ticket or any that still mentions that). I could not test.ping
the machine nor login over ssh to qa-power8-4-kvm.qa right now to do it myself and did not want to kick anyone out of SoL.
Updated by nicksinger almost 4 years ago
- Assignee changed from nicksinger to okurz
okurz wrote:
nicksinger wrote:
If we can't figure out what is causing all the power machines to lose their bootloader entries we might need to consider to remove auto reboots for now… Currently we're keeping ourself busy with recovering these hosts
Agreed. Feel welcome to just call
systemctl mask auto-update.service
on all affected machines for now (and revert before closing this ticket or any that still mentions that). I could nottest.ping
the machine nor login over ssh to qa-power8-4-kvm.qa right now to do it myself and did not want to kick anyone out of SoL.
yeah I tried to regenerate the grub config as this helped last time. But since Power8-5 showed exactly the same symptoms (stuck in petitboot, detecting network boot but not the installed OS) I start to think we face a product bug. Unfortunately I can't seem to figure out in what component :/
The auto-update.service seems to be masked already on Power8-4 as well as Power8-5. Isn't rebootmgr the service causing the reboots?
Updated by okurz almost 4 years ago
- Assignee changed from okurz to nicksinger
- Target version set to Ready
nicksinger wrote:
[…]
The auto-update.service seems to be masked already on Power8-4 as well as Power8-5. Isn't rebootmgr the service causing the reboots?
yes, of course. stupid me. Assuming you just assigned to me to answer the question I am assigning back to you. If you want me to do something else then you can assign it back but please tell me then what I should do :)
Updated by nicksinger almost 4 years ago
- Related to action #81058: [tracker-ticket] Power machines can't find installed OS. Automatic reboots disabled for now added
Updated by nicksinger almost 4 years ago
- Status changed from In Progress to Resolved
Resolving this in favor of the bigger tracking ticket regarding ppc boot problems. Also I think this covers more the immediate actions taken to get the host back up and running. Feel free to reopen if you disagree.