Project

General

Profile

Actions

action #157972

open

coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6

Upgrade o3 workers to openSUSE Leap 15.6

Added by okurz 3 months ago. Updated 2 days ago.

Status:
Blocked
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

  • Need to upgrade workers before EOL of Leap 15.5 and have a consistent environment

Acceptance criteria

  • AC1: all o3 worker machines run a clean upgraded openSUSE Leap 15.6 (no failed systemd services, no left over .rpm-new files, etc.)

Suggestions

  • read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
  • Reserve some time when the workers are only executing a few or no openQA test jobs
  • Keep IPMI interface ready and test that Serial-over-LAN works for potential recovery
  • Use the instructions from above but use transactional-update shell for transactional update workers
  • After upgrade reboot and check everything working as expected, if not rollback, e.g. with transactional-update rollback

Further details

  • Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)

Related issues 4 (2 open2 closed)

Related to openQA Tests - action #162239: [s390x] test fails in bootloader_start due to slow response from z/VM hypervisor and/or changed response on "cp i cms" commandBlockedokurz2024-06-13

Actions
Related to openQA Project - action #162320: multi-machine test failures 2024-06-14+, auto_review:"ping with packet size 100 failed.*can be GRE tunnel setup issue":retryResolvedokurz2024-06-15

Actions
Related to openQA Project - action #162683: s390x libvirt started kvm machines on Leap 15.6 fail with "unsupported configuration: machine type 's390-ccw-virtio-8.2' does not support ACPI" size:MNew2024-05-08

Actions
Copied from openQA Project - action #130585: Upgrade o3 workers to openSUSE Leap 15.5Resolvedokurz

Actions
Actions #1

Updated by okurz 3 months ago

  • Copied from action #130585: Upgrade o3 workers to openSUSE Leap 15.5 added
Actions #2

Updated by okurz 3 months ago

  • Subject changed from Upgrade o3 workers to openSUSE Leap 15.5 to Upgrade o3 workers to openSUSE Leap 15.6
  • Description updated (diff)
  • Assignee deleted (okurz)
  • Target version changed from Ready to future
Actions #3

Updated by okurz about 2 months ago

  • Target version changed from future to Tools - Next
Actions #4

Updated by okurz about 2 months ago

  • Target version changed from Tools - Next to Ready
Actions #5

Updated by okurz about 1 month ago

  • Target version changed from Ready to Tools - Next
Actions #6

Updated by okurz 16 days ago

  • Target version changed from Tools - Next to Ready
Actions #7

Updated by okurz 16 days ago

  • Target version changed from Ready to Tools - Next
Actions #8

Updated by okurz 16 days ago

  • Related to action #162239: [s390x] test fails in bootloader_start due to slow response from z/VM hypervisor and/or changed response on "cp i cms" command added
Actions #9

Updated by okurz 16 days ago

  • Status changed from New to In Progress
  • Assignee set to okurz
  • Target version changed from Tools - Next to Ready
Actions #10

Updated by okurz 16 days ago · Edited

Same as on OSD workers. System network does not come up. Connected over IPMI SoL and found from systemctl status wickedd-nanny

Jun 14 11:53:04 openqaworker21 wickedd-nanny[2755]: device br0: call to org.opensuse.Network.Firewall.firewallUp() failed: DBus method call timed out
…

same for other network devices. Rolling back.

From fvogt
org.opensuse.Network.Firewall is not firewalld, it's wicked. That it timed out means that the service exists and something is running. wicked provides this and firewallUp() just calls /etc/wicked/extensions/firewall up. And that basically just calls firewall-cmd to assign interfaces. wickedd-nanny is waiting for wicked which waits for firewalld

Actions #11

Updated by openqa_review 15 days ago

  • Due date set to 2024-06-29

Setting due date based on mean cycle time of SUSE QE Tools

Actions #12

Updated by okurz 15 days ago

  • Related to action #162320: multi-machine test failures 2024-06-14+, auto_review:"ping with packet size 100 failed.*can be GRE tunnel setup issue":retry added
Actions #13

Updated by okurz 13 days ago

  • Due date deleted (2024-06-29)
  • Status changed from In Progress to New
  • Assignee deleted (okurz)
  • Target version changed from Ready to Tools - Next
Actions #14

Updated by livdywan 4 days ago

  • Target version changed from Tools - Next to Ready

Moving this to the backlog because it's blocking #162239 which is already on the backlog - alternatively ofc you can re-consider the blocked ticket.

Actions #15

Updated by okurz 2 days ago

  • Related to action #162683: s390x libvirt started kvm machines on Leap 15.6 fail with "unsupported configuration: machine type 's390-ccw-virtio-8.2' does not support ACPI" size:M added
Actions #16

Updated by okurz 2 days ago

  • Status changed from New to Blocked
  • Assignee set to okurz
Actions

Also available in: Atom PDF