Project

General

Profile

Actions

action #163469

closed

openQA Project (public) - coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6

Upgrade a single o3 worker to openSUSE Leap 15.6

Added by okurz 5 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Start date:
2024-07-08
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

  • Need to upgrade workers before EOL of Leap 15.5 and have a consistent environment

Acceptance criteria

  • AC1: a single o3 worker machine runs a clean upgraded openSUSE Leap 15.6 (no failed systemd services, no left over .rpm-new files, etc.)

Suggestions

  • read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
  • Keep IPMI interface ready and test that Serial-over-LAN works for potential recovery
  • Apply the workaround for #162296, i.e. zypper al -m "boo#1227616" *firewall*
  • Use the instructions from above
  • After upgrade reboot and check everything working as expected, if not rollback, e.g. with transactional-update rollback
  • Monitor for effect on special test scenarios, e.g. iscsi which showed problems in the past
  • Record important details into the "upgrade all other" ticket #157972

Further details

  • Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)

Related issues 4 (1 open3 closed)

Related to openQA Project (public) - action #162683: s390x libvirt started kvm machines on Leap 15.6 fail with "unsupported configuration: machine type 's390-ccw-virtio-8.2' does not support ACPI" size:MResolvedmkittler2024-05-08

Actions
Related to openQA Project (public) - action #162296: openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 size:SIn Progressdheidler2024-06-142024-12-26

Actions
Related to openQA Infrastructure (public) - action #168454: `openqaworker21` fails with no `Qemu/KVM found`Resolvedokurz2024-10-18

Actions
Copied from openQA Project (public) - action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:SResolvedgpathak

Actions
Actions #1

Updated by okurz 5 months ago

  • Copied from action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added
Actions #2

Updated by okurz 5 months ago

  • Related to action #162683: s390x libvirt started kvm machines on Leap 15.6 fail with "unsupported configuration: machine type 's390-ccw-virtio-8.2' does not support ACPI" size:M added
Actions #3

Updated by okurz 5 months ago

blocked by #162683

Actions #4

Updated by okurz 3 months ago

  • Related to action #162296: openQA workers crash with Linux 6.4 after upgrade openSUSE Leap 15.6 size:S added
Actions #5

Updated by okurz 3 months ago

Blocked on #162296

Actions #6

Updated by okurz 2 months ago

  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Description updated (diff)
  • Category changed from Organisational to Feature requests
  • Status changed from Blocked to New
  • Assignee deleted (okurz)
  • Target version changed from Tools - Next to Ready
Actions #7

Updated by gpathak 2 months ago

  • Assignee set to gpathak
Actions #8

Updated by gpathak 2 months ago · Edited

Where can we find the list of worker details and credentials for o3 workers?
Is this the correct file https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls?ref_type=heads#L82?
Let's say, how to find the details and login credentials of this worker: https://openqa.opensuse.org/admin/workers/1152

Actions #9

Updated by okurz 2 months ago

gpathak wrote in #note-8:

Where can we find the list of worker details and credentials for o3 workers?
Is this the correct file https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls?ref_type=heads#L82?

yes but only for IPMI interfaces

Let's say, how to find the details and login credentials of this worker: https://openqa.opensuse.org/admin/workers/1152

https://progress.opensuse.org/projects/openqav3/wiki/#Infrastructure-setup-for-o3-openqaopensuseorg-and-osd-openqasusede

Actions #10

Updated by gpathak 2 months ago

Seems like this is still blocked on #162296

Actions #11

Updated by okurz 2 months ago

No, we shouldn't block on this longer. That's why I updated this ticket in #163469-6 and explained that we need to apply a workaround with a package lock

Actions #12

Updated by ybonatakis 2 months ago

  • Status changed from New to In Progress
Actions #13

Updated by gpathak 2 months ago · Edited

Upgrading worker openqaworker21 following steps from https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades

Actions #16

Updated by gpathak 2 months ago

The worker is upgraded, a reboot is pending. As I don't have access to IPMI via jumpy, I need some help with this.

Actions #17

Updated by livdywan 2 months ago

  • Status changed from In Progress to Blocked

gpathak wrote in #note-16:

The worker is upgraded, a reboot is pending. As I don't have access to IPMI via jumpy, I need some help with this.

Let's block on SD-170670

Actions #18

Updated by gpathak 2 months ago · Edited

After connecting to IMPI of openqaworker21, found out that the worker lost network connection.
None of the physical interface eth0, eth1 has an IP address assigned.
Do we use static IP addresses or dynamic ones assigned via an internal dhcp server?

Actions #19

Updated by okurz 2 months ago

dynamic with o3 aka ariel running dnsmasq for DHCP. Are you sure you applied the workaround mentioned in the description? The network might also recover with a reboot. If not consider a rollback with snapper

Actions #20

Updated by gpathak 2 months ago

  • Status changed from Blocked to In Progress
Actions #21

Updated by gpathak 2 months ago

@okurz I missed the workaround zypper al -m "boo#1227616" *firewall*
Performed a rollback to 15.5, executed the command to lock the firewall package upgrade and now doing a dist upgrade again.

Actions #22

Updated by gpathak 2 months ago

openqaworker21.openqanet.opensuse.org is Upgraded to

openqaworker21:~ # cat /etc/os-release 
NAME="openSUSE Leap"
VERSION="15.6"
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.6"
PRETTY_NAME="openSUSE Leap 15.6"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.6"
BUG_REPORT_URL="https://bugs.opensuse.org"
HOME_URL="https://www.opensuse.org/"
DOCUMENTATION_URL="https://en.opensuse.org/Portal:Leap"
LOGO="distributor-logo-Leap"
openqaworker21:~ #
Actions #23

Updated by gpathak 2 months ago

  • Status changed from In Progress to Feedback

Need to run tests on o3 worker21

Actions #24

Updated by gpathak 2 months ago

test execution on worker21: https://openqa.opensuse.org/tests/4574794

Actions #26

Updated by okurz 2 months ago

  • Related to action #168454: `openqaworker21` fails with no `Qemu/KVM found` added
Actions #27

Updated by gpathak about 2 months ago

Seems like the worker is fine: https://openqa.opensuse.org/admin/workers/754
@okurz Can we close this task, or should we keep it for a bit longer observation?

Actions #28

Updated by okurz about 2 months ago

  • Status changed from Feedback to Resolved

I also checked the stated and found passed jobs like https://openqa.opensuse.org/tests/4580343 after the upgrade so we are good

Actions

Also available in: Atom PDF