Project

General

Profile

Actions

action #118024

closed

openQA Project (public) - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

Ensure all PPC workers are upgraded after kernel regression resolved size:M

Added by okurz about 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
2022-10-11
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

After #114565 is resolved we should ensure all PPC workers are upgraded while keeping https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 in mind.

Acceptance criteria

  • AC1: All our OSD+O3 PPC workers run an upgraded current Leap (but still on a downgraded kernel if necessary)
  • AC2: Stable over reboots

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #114565: recover qa-power8-4+qa-power8-5 size:MResolvedokurz2022-12-19

Actions
Actions #1

Updated by okurz about 2 years ago

  • Related to action #114565: recover qa-power8-4+qa-power8-5 size:M added
Actions #2

Updated by okurz about 2 years ago

  • Tags set to infra
Actions #3

Updated by okurz almost 2 years ago

  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Description updated (diff)
  • Category deleted (Organisational)
  • Status changed from Blocked to New
  • Assignee deleted (okurz)
Actions #4

Updated by mkittler almost 2 years ago

  • Subject changed from Ensure all PPC workers are upgraded after kernel regression resolved to Ensure all PPC workers are upgraded after kernel regression resolved size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #5

Updated by mkittler almost 2 years ago

  • Assignee set to mkittler
Actions #6

Updated by mkittler almost 2 years ago

  • Status changed from Workable to In Progress
Actions #7

Updated by mkittler almost 2 years ago

I'm currently upgradeing qa-power8-4 (qa-power8-5 is already at Leap 15.4).

The ticket description also mentions power8.openqanet.opensuse.org but that machine is yet to be recovered. So once I'm otherwise done I'm going to block this ticket on #116078.

Actions #8

Updated by mkittler almost 2 years ago

  • Status changed from In Progress to Blocked

qa-power8-4 is now "up-to-date", that means Leap 15.4 but using the kernel/util-linux packages from Leap 15.3¹. There are no failed services and the worker appears normally on the web UI. I uninstalled the Leap 15.4 kernel (as it is also done on -5) to avoid it being selected in petitboot by default. I have also rebooted 2 times.

So this is now only blocked by #116078.


¹ via

wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/kernel-default-5.3.18-57.3.ppc64le.rpm
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-2.36.2-2.29.ppc64le.rpm
wget http://download.opensuse.org/ports/ppc/distribution/leap/15.3/repo/oss/ppc64le/util-linux-systemd-2.36.2-2.1.ppc64le.rpm
sudo zypper install --oldpackage kernel-default-5.3.18-57.3.ppc64le.rpm util-linux-2.36.2-2.29.ppc64le.rpm util-linux-systemd-2.36.2-2.1.ppc64le.rpm
# chose to uninstall util-linux-lang
sudo zypper al kernel-default
Actions #9

Updated by mkittler almost 2 years ago

  • Blocked by action #116078: Recover o3 worker kerosene formerly known as power8, restore IPMI access size:M added
Actions #10

Updated by okurz almost 2 years ago

  • Status changed from Blocked to Feedback

I think you overlooked something :)

$ sudo salt -C 'G@osarch:ppc64le' grains.get osrelease
powerqaworker-qam-1.qa.suse.de:
    15.4
QA-Power8-5-kvm.qa.suse.de:
    15.4
QA-Power8-4-kvm.qa.suse.de:
    15.4
malbec.arch.suse.de:
    15.3
grenache-1.qa.suse.de:
    15.4

I just ran into the problem that I could not find packages for the security sensor on malbec, well, now I know why :)

Upgraded malbec.arch.suse.de

By the way I wouldn't block on #116078, not sure if we will ever have that machine back. Just comment there that it needs to be upgraded as well.

Actions #11

Updated by mkittler almost 2 years ago

I thought this was only about the hosts that were mentioned explicitly in the ticket description. Thanks for upgrading malbec. I'll also check the kernel version again on all those hosts.

Actions #12

Updated by mkittler almost 2 years ago

sudo salt -C 'G@osarch:ppc64le' cmd.run 'uname -a' shows that all workers run on a downgraded kernel version (5.3.18). It is not 100 % consistent because QA-Power8-4-kvm.qa.suse.de uses an older build of that kernel version than the others. Maybe I can unify that (although I'm not sure where I'd get that newer build now).

The only exception is grenache-1.qa.suse.de which runs on the normal kernel provided by Leap 15.4. I don't think it makes sense to downgrade that host for the sake of consistency considering it runs without crashes.

Actions #13

Updated by mkittler almost 2 years ago

  • Status changed from Feedback to Blocked

I installed now http://download.opensuse.org/update/leap/15.3/sle/ppc64le/kernel-default-5.3.18-150300.59.93.1.ppc64le.rpm on QA-Power8-4-kvm.qa.suse.de (and uninstalled all other kernel versions) so all hosts with downgraded kernel are now downgraded consistently. After rebooting the worker everything looks still good and uname -a show now a consistent version across all downgraded machines. So I'm setting this ticket back to blocked.

Actions #14

Updated by okurz almost 2 years ago

mkittler wrote:

So I'm setting this ticket back to blocked.

Do you still want to block on #116078? I wouldn't do that as it's not even clear if we will ever have that machine back and if we do then we need to ensure it's properly upgraded and added to salt anyway. I suggest you resolve this ticket here.

Actions #15

Updated by mkittler almost 2 years ago

  • Blocked by deleted (action #116078: Recover o3 worker kerosene formerly known as power8, restore IPMI access size:M)
Actions #16

Updated by mkittler almost 2 years ago

  • Status changed from Blocked to Resolved
Actions

Also available in: Atom PDF