Project

General

Profile

action #118024

openQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

Ensure all PPC workers are upgraded after kernel regression resolved size:M

Added by okurz 4 months ago. Updated 13 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2022-10-11
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

After #114565 is resolved we should ensure all PPC workers are upgraded while keeping https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 in mind.

Acceptance criteria

  • AC1: All our OSD+O3 PPC workers run an upgraded current Leap (but still on a downgraded kernel if necessary)
  • AC2: Stable over reboots

Suggestions


Related issues

Related to openQA Infrastructure - action #114565: recover qa-power8-4+qa-power8-5 size:MResolved2022-12-19

History

#1 Updated by okurz 4 months ago

  • Related to action #114565: recover qa-power8-4+qa-power8-5 size:M added

#2 Updated by okurz about 2 months ago

  • Tags set to infra

#3 Updated by okurz 18 days ago

  • Project changed from openQA Project to openQA Infrastructure
  • Description updated (diff)
  • Category deleted (Organisational)
  • Status changed from Blocked to New
  • Assignee deleted (okurz)

#4 Updated by mkittler 17 days ago

  • Subject changed from Ensure all PPC workers are upgraded after kernel regression resolved to Ensure all PPC workers are upgraded after kernel regression resolved size:M
  • Description updated (diff)
  • Status changed from New to Workable

#5 Updated by mkittler 17 days ago

  • Assignee set to mkittler

#6 Updated by mkittler 17 days ago

  • Status changed from Workable to In Progress

#7 Updated by mkittler 17 days ago

I'm currently upgradeing qa-power8-4 (qa-power8-5 is already at Leap 15.4).

The ticket description also mentions power8.openqanet.opensuse.org but that machine is yet to be recovered. So once I'm otherwise done I'm going to block this ticket on #116078.

#8 Updated by mkittler 17 days ago

  • Status changed from In Progress to Blocked

qa-power8-4 is now "up-to-date", that means Leap 15.4 but using the kernel/util-linux packages from Leap 15.3¹. There are no failed services and the worker appears normally on the web UI. I uninstalled the Leap 15.4 kernel (as it is also done on -5) to avoid it being selected in petitboot by default. I have also rebooted 2 times.

So this is now only blocked by #116078.


¹ via

wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/kernel-default-5.3.18-57.3.ppc64le.rpm
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-2.36.2-2.29.ppc64le.rpm
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-systemd-2.36.2-2.1.ppc64le.rpm
sudo zypper install --oldpackage kernel-default-5.3.18-57.3.ppc64le.rpm util-linux-2.36.2-2.29.ppc64le.rpm util-linux-systemd-2.36.2-2.1.ppc64le.rpm
# chose to uninstall util-linux-lang
sudo zypper al kernel-default

#9 Updated by mkittler 17 days ago

  • Blocked by action #116078: Recover o3 worker power8, restore IPMI access size:M added

#10 Updated by okurz 17 days ago

  • Status changed from Blocked to Feedback

I think you overlooked something :)

$ sudo salt -C 'G@osarch:ppc64le' grains.get osrelease
powerqaworker-qam-1.qa.suse.de:
    15.4
QA-Power8-5-kvm.qa.suse.de:
    15.4
QA-Power8-4-kvm.qa.suse.de:
    15.4
malbec.arch.suse.de:
    15.3
grenache-1.qa.suse.de:
    15.4

I just ran into the problem that I could not find packages for the security sensor on malbec, well, now I know why :)

Upgraded malbec.arch.suse.de

By the way I wouldn't block on #116078, not sure if we will ever have that machine back. Just comment there that it needs to be upgraded as well.

#11 Updated by mkittler 16 days ago

I thought this was only about the hosts that were mentioned explicitly in the ticket description. Thanks for upgrading malbec. I'll also check the kernel version again on all those hosts.

#12 Updated by mkittler 16 days ago

sudo salt -C 'G@osarch:ppc64le' cmd.run 'uname -a' shows that all workers run on a downgraded kernel version (5.3.18). It is not 100 % consistent because QA-Power8-4-kvm.qa.suse.de uses an older build of that kernel version than the others. Maybe I can unify that (although I'm not sure where I'd get that newer build now).

The only exception is grenache-1.qa.suse.de which runs on the normal kernel provided by Leap 15.4. I don't think it makes sense to downgrade that host for the sake of consistency considering it runs without crashes.

#13 Updated by mkittler 16 days ago

  • Status changed from Feedback to Blocked

I installed now http://download.opensuse.org/update/leap/15.3/sle/ppc64le/kernel-default-5.3.18-150300.59.93.1.ppc64le.rpm on QA-Power8-4-kvm.qa.suse.de (and uninstalled all other kernel versions) so all hosts with downgraded kernel are now downgraded consistently. After rebooting the worker everything looks still good and uname -a show now a consistent version across all downgraded machines. So I'm setting this ticket back to blocked.

#14 Updated by okurz 16 days ago

mkittler wrote:

So I'm setting this ticket back to blocked.

Do you still want to block on #116078? I wouldn't do that as it's not even clear if we will ever have that machine back and if we do then we need to ensure it's properly upgraded and added to salt anyway. I suggest you resolve this ticket here.

#15 Updated by mkittler 13 days ago

  • Blocked by deleted (action #116078: Recover o3 worker power8, restore IPMI access size:M)

#16 Updated by mkittler 13 days ago

  • Status changed from Blocked to Resolved

Also available in: Atom PDF