action #118024
closedopenQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4
Ensure all PPC workers are upgraded after kernel regression resolved size:M
0%
Description
Motivation¶
After #114565 is resolved we should ensure all PPC workers are upgraded while keeping https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 in mind.
Acceptance criteria¶
- AC1: All our OSD+O3 PPC workers run an upgraded current Leap (but still on a downgraded kernel if necessary)
- AC2: Stable over reboots
Suggestions¶
- Based on status in https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 decide if we can upgrade to Leap 15.4 normally or need to pin a certain kernel version
- After https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 is resolved remove kernel-default and util-linux zypper package locks on qa-power8-4, qa-power8-5, power8.openqanet.opensuse.org
- Upgrade kernel+OS on qa-power8-4, qa-power8-5, power8.openqanet.opensuse.org
Updated by okurz about 2 years ago
- Related to action #114565: recover qa-power8-4+qa-power8-5 size:M added
Updated by okurz almost 2 years ago
- Project changed from openQA Project to openQA Infrastructure
- Description updated (diff)
- Category deleted (
Organisational) - Status changed from Blocked to New
- Assignee deleted (
okurz)
Updated by mkittler almost 2 years ago
- Subject changed from Ensure all PPC workers are upgraded after kernel regression resolved to Ensure all PPC workers are upgraded after kernel regression resolved size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler almost 2 years ago
- Status changed from Workable to In Progress
Updated by mkittler almost 2 years ago
I'm currently upgradeing qa-power8-4 (qa-power8-5 is already at Leap 15.4).
The ticket description also mentions power8.openqanet.opensuse.org but that machine is yet to be recovered. So once I'm otherwise done I'm going to block this ticket on #116078.
Updated by mkittler almost 2 years ago
- Status changed from In Progress to Blocked
qa-power8-4 is now "up-to-date", that means Leap 15.4 but using the kernel/util-linux packages from Leap 15.3¹. There are no failed services and the worker appears normally on the web UI. I uninstalled the Leap 15.4 kernel (as it is also done on -5
) to avoid it being selected in petitboot by default. I have also rebooted 2 times.
So this is now only blocked by #116078.
¹ via
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/kernel-default-5.3.18-57.3.ppc64le.rpm
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-2.36.2-2.29.ppc64le.rpm
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-systemd-2.36.2-2.1.ppc64le.rpm
sudo zypper install --oldpackage kernel-default-5.3.18-57.3.ppc64le.rpm util-linux-2.36.2-2.29.ppc64le.rpm util-linux-systemd-2.36.2-2.1.ppc64le.rpm
# chose to uninstall util-linux-lang
sudo zypper al kernel-default
Updated by mkittler almost 2 years ago
- Blocked by action #116078: Recover o3 worker kerosene formerly known as power8, restore IPMI access size:M added
Updated by okurz almost 2 years ago
- Status changed from Blocked to Feedback
I think you overlooked something :)
$ sudo salt -C 'G@osarch:ppc64le' grains.get osrelease
powerqaworker-qam-1.qa.suse.de:
15.4
QA-Power8-5-kvm.qa.suse.de:
15.4
QA-Power8-4-kvm.qa.suse.de:
15.4
malbec.arch.suse.de:
15.3
grenache-1.qa.suse.de:
15.4
I just ran into the problem that I could not find packages for the security sensor on malbec, well, now I know why :)
Upgraded malbec.arch.suse.de
By the way I wouldn't block on #116078, not sure if we will ever have that machine back. Just comment there that it needs to be upgraded as well.
Updated by mkittler almost 2 years ago
I thought this was only about the hosts that were mentioned explicitly in the ticket description. Thanks for upgrading malbec. I'll also check the kernel version again on all those hosts.
Updated by mkittler almost 2 years ago
sudo salt -C 'G@osarch:ppc64le' cmd.run 'uname -a'
shows that all workers run on a downgraded kernel version (5.3.18). It is not 100 % consistent because QA-Power8-4-kvm.qa.suse.de uses an older build of that kernel version than the others. Maybe I can unify that (although I'm not sure where I'd get that newer build now).
The only exception is grenache-1.qa.suse.de which runs on the normal kernel provided by Leap 15.4. I don't think it makes sense to downgrade that host for the sake of consistency considering it runs without crashes.
Updated by mkittler almost 2 years ago
- Status changed from Feedback to Blocked
I installed now http://download.opensuse.org/update/leap/15.3/sle/ppc64le/kernel-default-5.3.18-150300.59.93.1.ppc64le.rpm on QA-Power8-4-kvm.qa.suse.de (and uninstalled all other kernel versions) so all hosts with downgraded kernel are now downgraded consistently. After rebooting the worker everything looks still good and uname -a
show now a consistent version across all downgraded machines. So I'm setting this ticket back to blocked.
Updated by okurz almost 2 years ago
mkittler wrote:
So I'm setting this ticket back to blocked.
Do you still want to block on #116078? I wouldn't do that as it's not even clear if we will ever have that machine back and if we do then we need to ensure it's properly upgraded and added to salt anyway. I suggest you resolve this ticket here.
Updated by mkittler almost 2 years ago
- Blocked by deleted (action #116078: Recover o3 worker kerosene formerly known as power8, restore IPMI access size:M)