action #62102
closedaarch64.o.o did not come up after nightly upgrade due to grub2-arm64-efi upgrade (boo#1162320)
0%
Description
after nightly upgrade I received the email alert from my ping test. Power cycling does not help as the kernel can not but. Something about an unreferenced symbol or something. Booting older snapshot from grub menu selected over ipmi SOL.
Updated by okurz almost 5 years ago
Trying with transactional-upgrade dup && reboot
ends up in the same.
Message when trying to load the kernel:
Loading Linux 4.12.14-lp151.28.36-default ...
error: symbol `grub_efi_allocate_any_pages' not found.
Loading initial ramdisk ...
error: symbol `grub_efi_allocate_any_pages' not found.
and then returns to grub.
Booted into a snapshot once again and trying with an older kernel: transactional-update pkg install kernel-default-4.12.14-lp151.28.32.1
from zypper search --details kernel-default
. In the new booted session I did transactional-update shell
and in there zypper al kernel-default
and zypper dup
.
Still failed to boot, problem is not in kernel. Retrying with zypper al grub2 grub2-arm64-efi dracut
. Also did a zypper patch
and reboot as well as zypper dup
and reboot. This worked so far. Removed lock from dracut
and trying upgrade: "dracut 044.2-lp151.2.3.1 -> 044.2-lp151.2.9.1". dracut was also fine, trying grub2. grub2 also fine, so it's most likely grub2-arm64-efi or it was a transient problem fixed by the multiple reinstalls and reboots. I will let the system be for now as it's pretty busy with testing.
Updated by okurz almost 5 years ago
- Status changed from In Progress to Blocked
Currently no tests running, executing zypper rl grub2-arm64-efi && transactional-update dup reboot
. Failed again.
The upgrade grub2-arm64-efi 2.02-lp151.21.6.1 -> 2.02-lp151.21.9.1 triggers the problem. I added the lock back.
Reported bug https://bugzilla.opensuse.org/show_bug.cgi?id=1162320
Updated by okurz almost 5 years ago
- Status changed from Blocked to Workable
We can apply the workaround as mentioned in https://bugzilla.opensuse.org/show_bug.cgi?id=1122591#c28 :
cd /boot/grub2
mv arm64-efi arm64-efi.bk
btrfs subvolume create /boot/grub2/arm64-efi
cp -r arm64-efi.bk/* arm64-efi/
Updated by okurz over 4 years ago
- Status changed from Workable to In Progress
No action in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320 yet. I asked in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320#c3 what ETA to expect. In the meantime I will try the workaround. The aarch64 worker is right now not doing any openQA tests so I guess now is a good time.
cd /boot/grub2/
mv arm64-efi/ arm64-efi.bk_poo#62102
btrfs subvolume create /boot/grub2/arm64-efi
cp -r arm64-efi.bk_poo#62102/* arm64-efi/
zypper rl grub2-arm64-efi
zypper in grub2-arm64-efi
this upgraded "grub2-arm64-efi 2.02-lp151.21.6.1 -> 2.02-lp151.21.9.1".
And did not work, same error as in before. I will update the bug https://bugzilla.opensuse.org/show_bug.cgi?id=1162320
Updated by okurz over 4 years ago
- Status changed from In Progress to Blocked
Updated the bug stating that the problem is still there, the intended bugfix is not effective or not present and the workaround does not work.
Updated by michael-chang over 4 years ago
okurz wrote:
No action in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320 yet. I asked in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320#c3 what ETA to expect. In the meantime I will try the workaround. The aarch64 worker is right now not doing any openQA tests so I guess now is a good time.
cd /boot/grub2/ mv arm64-efi/ arm64-efi.bk_poo#62102 btrfs subvolume create /boot/grub2/arm64-efi cp -r arm64-efi.bk_poo#62102/* arm64-efi/ zypper rl grub2-arm64-efi zypper in grub2-arm64-efi
this upgraded "grub2-arm64-efi 2.02-lp151.21.6.1 -> 2.02-lp151.21.9.1".
And did not work, same error as in before. I will update the bug https://bugzilla.opensuse.org/show_bug.cgi?id=1162320
Would you please be more clear about when the "same error" happened ? Is it 1. right after reboot 2. right after you boot to the snapshot 3. right after boot to the snapshot, rollback and reboot 4. else ?....
Because this update seems to not have version changes, therefore you shouldnt hit by case#1, but case#2 and case#3 is likely .. I just want to make sure we're on the same page.
Updated by okurz over 4 years ago
michael-chang wrote:
[…]
Would you please be more clear about when the "same error" happened ? Is it 1. right after reboot 2. right after you boot to the snapshot 3. right after boot to the snapshot, rollback and reboot 4. else ?....Because this update seems to not have version changes, therefore you shouldnt hit by case#1, but case#2 and case#3 is likely .. I just want to make sure we're on the same page.
It's 1., right after reboot. But please keep in mind that this is a "transactional server", i.e. what we do is that from the running system we create a new btrfs r/w snapshot, install updated packages into that snapshot and in grub we try to boot into that snapshot. When grub tries to load kernel+initrd and boot this is the time the error message appears and we are back to grub.
Updated by michael-chang over 4 years ago
okurz wrote:
michael-chang wrote:
It's 1., right after reboot. But please keep in mind that this is a "transactional server", i.e. what we do is that from the running system we create a new btrfs r/w snapshot, install updated packages into that snapshot and in grub we try to boot into that snapshot. When grub tries to load kernel+initrd and boot this is the time the error message appears and we are back to grub.
Thanks for the detailed information. I am confused because it was zypper in grub2-arm64-efi
in comment#6 to install grub. But in transactional server zypper install
seems to have been replaced by transactional-update pkg install ..
. Hence I need to double check.
I will install a transaction server and work on test script. Also that could provide a better common ground to diagnose the problem.
Updated by okurz over 4 years ago
michael-chang wrote:
[…] I am confused because it was
zypper in grub2-arm64-efi
in comment#6 to install grub. But in transactional serverzypper install
seems to have been replaced bytransactional-update pkg install ..
. Hence I need to double check.
Right. What I did was transactional-update shell
and then zypper
in there. This is basically the same as transactional-update pkg in …
for the basic cases.
I will install a transaction server and work on test script. Also that could provide a better common ground to diagnose the problem.
Sounds great!
I had the good opportunity to talk with rwill yesterday and he explained me how shim-install in the secureboot case is preventing the problem differently. We also checked some other cases. E.g. clean install cases like https://openqa.opensuse.org/tests/1200018 on leap 15.2 transactional server aarch64 look fine. logs like https://openqa.opensuse.org/tests/1200018/file/logs_from_installation_system-y2logs.tar.bz2 show in file YaST2/storage-inst/05-commited.yml that the btrfs subvol @/boot/grub2/arm64-efi is created. So the problem is the maintenance update that brings in a new version requiring a btrfs subvol which isn't present. I will also comment in the bug again in time.
Updated by michael-chang over 4 years ago
Thanks you for letting me know. Given that it was identified as a problem of missing subvolume, did you think the test script is still necessary? The attempt of that script is to know more about what was happening in case we go into wrong direction which seems we have better idea now. Thanks.
Updated by okurz over 4 years ago
- Subject changed from aarch64.o.o did not come up after nightly upgrade to aarch64.o.o did not come up after nightly upgrade due to grub2-arm64-efi upgrade (boo#1162320)
Updated by okurz over 4 years ago
- Related to action #65073: aarch64.o.o fails to delete snapshots in snapper-cleanup since 2020-03-20 repeatedly added
Updated by okurz over 4 years ago
- Status changed from Blocked to Resolved
By now the issue was "resolved" for the specific machine aarch64.o.o by reinstalling after we had a filesystem corruption.