Project

General

Profile

Actions

action #62102

closed

aarch64.o.o did not come up after nightly upgrade due to grub2-arm64-efi upgrade (boo#1162320)

Added by okurz over 4 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2020-01-14
Due date:
% Done:

0%

Estimated time:

Description

after nightly upgrade I received the email alert from my ping test. Power cycling does not help as the kernel can not but. Something about an unreferenced symbol or something. Booting older snapshot from grub menu selected over ipmi SOL.


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #65073: aarch64.o.o fails to delete snapshots in snapper-cleanup since 2020-03-20 repeatedlyResolvedokurz2020-03-31

Actions
Actions #1

Updated by okurz over 4 years ago

Trying with transactional-upgrade dup && reboot ends up in the same.

Message when trying to load the kernel:

Loading Linux 4.12.14-lp151.28.36-default ...
error: symbol `grub_efi_allocate_any_pages' not found.
Loading initial ramdisk ...                           
error: symbol `grub_efi_allocate_any_pages' not found.

and then returns to grub.

Booted into a snapshot once again and trying with an older kernel: transactional-update pkg install kernel-default-4.12.14-lp151.28.32.1 from zypper search --details kernel-default. In the new booted session I did transactional-update shell and in there zypper al kernel-default and zypper dup.

Still failed to boot, problem is not in kernel. Retrying with zypper al grub2 grub2-arm64-efi dracut. Also did a zypper patch and reboot as well as zypper dup and reboot. This worked so far. Removed lock from dracut and trying upgrade: "dracut 044.2-lp151.2.3.1 -> 044.2-lp151.2.9.1". dracut was also fine, trying grub2. grub2 also fine, so it's most likely grub2-arm64-efi or it was a transient problem fixed by the multiple reinstalls and reboots. I will let the system be for now as it's pretty busy with testing.

Actions #2

Updated by okurz over 4 years ago

  • Status changed from In Progress to Blocked

Currently no tests running, executing zypper rl grub2-arm64-efi && transactional-update dup reboot. Failed again.
The upgrade grub2-arm64-efi 2.02-lp151.21.6.1 -> 2.02-lp151.21.9.1 triggers the problem. I added the lock back.

Reported bug https://bugzilla.opensuse.org/show_bug.cgi?id=1162320

Actions #3

Updated by okurz over 4 years ago

  • Status changed from Blocked to Workable

We can apply the workaround as mentioned in https://bugzilla.opensuse.org/show_bug.cgi?id=1122591#c28 :

cd /boot/grub2
mv arm64-efi arm64-efi.bk
btrfs subvolume create /boot/grub2/arm64-efi
cp -r arm64-efi.bk/* arm64-efi/
Actions #4

Updated by okurz over 4 years ago

  • Status changed from Workable to In Progress

No action in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320 yet. I asked in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320#c3 what ETA to expect. In the meantime I will try the workaround. The aarch64 worker is right now not doing any openQA tests so I guess now is a good time.

cd /boot/grub2/
mv arm64-efi/ arm64-efi.bk_poo#62102
btrfs subvolume create /boot/grub2/arm64-efi
cp -r arm64-efi.bk_poo#62102/* arm64-efi/
zypper rl grub2-arm64-efi
zypper in grub2-arm64-efi

this upgraded "grub2-arm64-efi 2.02-lp151.21.6.1 -> 2.02-lp151.21.9.1".

And did not work, same error as in before. I will update the bug https://bugzilla.opensuse.org/show_bug.cgi?id=1162320

Actions #5

Updated by okurz over 4 years ago

  • Status changed from In Progress to Blocked

Updated the bug stating that the problem is still there, the intended bugfix is not effective or not present and the workaround does not work.

Actions #6

Updated by michael-chang over 4 years ago

okurz wrote:

No action in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320 yet. I asked in https://bugzilla.opensuse.org/show_bug.cgi?id=1162320#c3 what ETA to expect. In the meantime I will try the workaround. The aarch64 worker is right now not doing any openQA tests so I guess now is a good time.

cd /boot/grub2/
mv arm64-efi/ arm64-efi.bk_poo#62102
btrfs subvolume create /boot/grub2/arm64-efi
cp -r arm64-efi.bk_poo#62102/* arm64-efi/
zypper rl grub2-arm64-efi
zypper in grub2-arm64-efi

this upgraded "grub2-arm64-efi 2.02-lp151.21.6.1 -> 2.02-lp151.21.9.1".

And did not work, same error as in before. I will update the bug https://bugzilla.opensuse.org/show_bug.cgi?id=1162320

Would you please be more clear about when the "same error" happened ? Is it 1. right after reboot 2. right after you boot to the snapshot 3. right after boot to the snapshot, rollback and reboot 4. else ?....

Because this update seems to not have version changes, therefore you shouldnt hit by case#1, but case#2 and case#3 is likely .. I just want to make sure we're on the same page.

Actions #7

Updated by okurz over 4 years ago

michael-chang wrote:

[…]
Would you please be more clear about when the "same error" happened ? Is it 1. right after reboot 2. right after you boot to the snapshot 3. right after boot to the snapshot, rollback and reboot 4. else ?....

Because this update seems to not have version changes, therefore you shouldnt hit by case#1, but case#2 and case#3 is likely .. I just want to make sure we're on the same page.

It's 1., right after reboot. But please keep in mind that this is a "transactional server", i.e. what we do is that from the running system we create a new btrfs r/w snapshot, install updated packages into that snapshot and in grub we try to boot into that snapshot. When grub tries to load kernel+initrd and boot this is the time the error message appears and we are back to grub.

Actions #8

Updated by michael-chang over 4 years ago

okurz wrote:

michael-chang wrote:

It's 1., right after reboot. But please keep in mind that this is a "transactional server", i.e. what we do is that from the running system we create a new btrfs r/w snapshot, install updated packages into that snapshot and in grub we try to boot into that snapshot. When grub tries to load kernel+initrd and boot this is the time the error message appears and we are back to grub.

Thanks for the detailed information. I am confused because it was zypper in grub2-arm64-efi in comment#6 to install grub. But in transactional server zypper install seems to have been replaced by transactional-update pkg install ... Hence I need to double check.

I will install a transaction server and work on test script. Also that could provide a better common ground to diagnose the problem.

Actions #9

Updated by okurz over 4 years ago

michael-chang wrote:

[…] I am confused because it was zypper in grub2-arm64-efi in comment#6 to install grub. But in transactional server zypper install seems to have been replaced by transactional-update pkg install ... Hence I need to double check.

Right. What I did was transactional-update shell and then zypper in there. This is basically the same as transactional-update pkg in … for the basic cases.

I will install a transaction server and work on test script. Also that could provide a better common ground to diagnose the problem.

Sounds great!

I had the good opportunity to talk with rwill yesterday and he explained me how shim-install in the secureboot case is preventing the problem differently. We also checked some other cases. E.g. clean install cases like https://openqa.opensuse.org/tests/1200018 on leap 15.2 transactional server aarch64 look fine. logs like https://openqa.opensuse.org/tests/1200018/file/logs_from_installation_system-y2logs.tar.bz2 show in file YaST2/storage-inst/05-commited.yml that the btrfs subvol @/boot/grub2/arm64-efi is created. So the problem is the maintenance update that brings in a new version requiring a btrfs subvol which isn't present. I will also comment in the bug again in time.

Actions #10

Updated by michael-chang over 4 years ago

Thanks you for letting me know. Given that it was identified as a problem of missing subvolume, did you think the test script is still necessary? The attempt of that script is to know more about what was happening in case we go into wrong direction which seems we have better idea now. Thanks.

Actions #11

Updated by okurz over 4 years ago

  • Subject changed from aarch64.o.o did not come up after nightly upgrade to aarch64.o.o did not come up after nightly upgrade due to grub2-arm64-efi upgrade (boo#1162320)
Actions #12

Updated by okurz over 4 years ago

  • Related to action #65073: aarch64.o.o fails to delete snapshots in snapper-cleanup since 2020-03-20 repeatedly added
Actions #13

Updated by okurz over 4 years ago

  • Status changed from Blocked to Resolved

By now the issue was "resolved" for the specific machine aarch64.o.o by reinstalling after we had a filesystem corruption.

Actions

Also available in: Atom PDF