action #118024
openQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4
Ensure all PPC workers are upgraded after kernel regression resolved size:M
0%
Description
Motivation¶
After #114565 is resolved we should ensure all PPC workers are upgraded while keeping https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 in mind.
Acceptance criteria¶
- AC1: All our OSD+O3 PPC workers run an upgraded current Leap (but still on a downgraded kernel if necessary)
- AC2: Stable over reboots
Suggestions¶
- Based on status in https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 decide if we can upgrade to Leap 15.4 normally or need to pin a certain kernel version
- After https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 is resolved remove kernel-default and util-linux zypper package locks on qa-power8-4, qa-power8-5, power8.openqanet.opensuse.org
- Upgrade kernel+OS on qa-power8-4, qa-power8-5, power8.openqanet.opensuse.org
Related issues
History
#1
Updated by okurz 4 months ago
- Related to action #114565: recover qa-power8-4+qa-power8-5 size:M added
#2
Updated by okurz about 2 months ago
- Tags set to infra
#8
Updated by mkittler 17 days ago
- Status changed from In Progress to Blocked
qa-power8-4 is now "up-to-date", that means Leap 15.4 but using the kernel/util-linux packages from Leap 15.3¹. There are no failed services and the worker appears normally on the web UI. I uninstalled the Leap 15.4 kernel (as it is also done on -5
) to avoid it being selected in petitboot by default. I have also rebooted 2 times.
So this is now only blocked by #116078.
¹ via
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/kernel-default-5.3.18-57.3.ppc64le.rpm wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-2.36.2-2.29.ppc64le.rpm wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-systemd-2.36.2-2.1.ppc64le.rpm sudo zypper install --oldpackage kernel-default-5.3.18-57.3.ppc64le.rpm util-linux-2.36.2-2.29.ppc64le.rpm util-linux-systemd-2.36.2-2.1.ppc64le.rpm # chose to uninstall util-linux-lang sudo zypper al kernel-default
#9
Updated by mkittler 17 days ago
- Blocked by action #116078: Recover o3 worker power8, restore IPMI access size:M added
#10
Updated by okurz 17 days ago
- Status changed from Blocked to Feedback
I think you overlooked something :)
$ sudo salt -C 'G@osarch:ppc64le' grains.get osrelease powerqaworker-qam-1.qa.suse.de: 15.4 QA-Power8-5-kvm.qa.suse.de: 15.4 QA-Power8-4-kvm.qa.suse.de: 15.4 malbec.arch.suse.de: 15.3 grenache-1.qa.suse.de: 15.4
I just ran into the problem that I could not find packages for the security sensor on malbec, well, now I know why :)
Upgraded malbec.arch.suse.de
By the way I wouldn't block on #116078, not sure if we will ever have that machine back. Just comment there that it needs to be upgraded as well.
#12
Updated by mkittler 16 days ago
sudo salt -C 'G@osarch:ppc64le' cmd.run 'uname -a'
shows that all workers run on a downgraded kernel version (5.3.18). It is not 100 % consistent because QA-Power8-4-kvm.qa.suse.de uses an older build of that kernel version than the others. Maybe I can unify that (although I'm not sure where I'd get that newer build now).
The only exception is grenache-1.qa.suse.de which runs on the normal kernel provided by Leap 15.4. I don't think it makes sense to downgrade that host for the sake of consistency considering it runs without crashes.
#13
Updated by mkittler 16 days ago
- Status changed from Feedback to Blocked
I installed now http://download.opensuse.org/update/leap/15.3/sle/ppc64le/kernel-default-5.3.18-150300.59.93.1.ppc64le.rpm on QA-Power8-4-kvm.qa.suse.de (and uninstalled all other kernel versions) so all hosts with downgraded kernel are now downgraded consistently. After rebooting the worker everything looks still good and uname -a
show now a consistent version across all downgraded machines. So I'm setting this ticket back to blocked.
#14
Updated by okurz 16 days ago
mkittler wrote:
So I'm setting this ticket back to blocked.
Do you still want to block on #116078? I wouldn't do that as it's not even clear if we will ever have that machine back and if we do then we need to ensure it's properly upgraded and added to salt anyway. I suggest you resolve this ticket here.
#15
Updated by mkittler 13 days ago
- Blocked by deleted (action #116078: Recover o3 worker power8, restore IPMI access size:M)