Project

General

Profile

Actions

action #111866

closed

openQA Project (public) - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

  • Need to upgrade workers before EOL of Leap 15.3 and have a consistent environment

Acceptance criteria

  • AC1: all osd worker machines run a clean upgraded openSUSE Leap 15.4 (no failed systemd services, no left over .rpm-new files, etc.)
  • AC2: openqa-monitor runs openSUSE Leap 15.4

Suggestions

Further details

  • Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)

Related issues 6 (1 open5 closed)

Related to openQA Project (public) - action #111992: Deal with QEMU and OVMF default resolution being 1280x800, affecting (at least) qxl size:MResolvedtinita2022-06-03

Actions
Related to openQA Tests (public) - action #108548: [sle][security][backlog]automation: Integrate 'secure-boot' on Power into openQABlocked2022-03-17

Actions
Related to openQA Tests (public) - action #114493: [qe-core][aarch64][installation]test fails in bootloader_start, needle mismatch on installer boot memuResolvedokurz2022-07-22

Actions
Copied from openQA Infrastructure (public) - action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:MResolvedlivdywan

Actions
Copied to openQA Infrastructure (public) - action #114526: recover openqaworker14Resolvedmkittler

Actions
Copied to openQA Project (public) - action #130588: Upgrade osd workers to openSUSE Leap 15.5Resolvedokurz

Actions
Actions #1

Updated by okurz over 2 years ago

  • Copied from action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M added
Actions #2

Updated by okurz over 2 years ago

  • Subject changed from Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M to Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4 size:M
  • Description updated (diff)
  • Assignee deleted (livdywan)
  • Priority changed from High to Normal
  • Target version changed from Ready to future
Actions #3

Updated by okurz over 2 years ago

  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Subject changed from Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4 size:M to Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4
Actions #4

Updated by okurz over 2 years ago

  • Related to action #111992: Deal with QEMU and OVMF default resolution being 1280x800, affecting (at least) qxl size:M added
Actions #5

Updated by okurz over 2 years ago

  • Status changed from New to Blocked
  • Assignee set to okurz
  • Target version changed from future to Ready
Actions #6

Updated by punkioudi over 2 years ago

  • Related to action #108548: [sle][security][backlog]automation: Integrate 'secure-boot' on Power into openQA added
Actions #7

Updated by okurz over 2 years ago

@punkioudi I wonder, where do you see a relation to #108548? What's your expectation?

Actions #8

Updated by okurz over 2 years ago

  • Status changed from Blocked to In Progress
Actions #9

Updated by okurz over 2 years ago

  • Due date set to 2022-08-04

Upgrading openqaworker11 and openqaworker12 manually as they are currently not in salt. openqaworker12 is still a Leap 15.2 so I will do a Leap15.2->15.4 direct upgrade for fun and because I am curious what happens.

Many repositories have changed the name format from "openSUSE_Leap_$releasever" to "$releasever" so we need to adapt for that.
Actually by now there are more machines controlled by our salt structure so let's upgrade them along the way

powerqaworker-qam-1.qa.suse.de has some conflicts about python2 packages, manually removed first.

sudo salt --no-color --state-output=changes -C 'powerqaworker-qam-1.qa.suse.de' cmd.run 'zypper -n rm -u python2-libxml2-python'

then all the machines:

sudo salt --no-color --state-output=changes -C 'not G@roles:webui' cmd.run '(rpm -q qemu-ovmf-x86_64 && zypper al qemu-ovmf-x86_64) ; zypper rr telegraf-monitoring && sed -i -e "s@/openSUSE_Leap_@/@g" /etc/zypp/repos.d/* && zypper -n --releasever=15.4 ref && zypper -n --releasever=15.4 dup --auto-agree-with-licenses --replacefiles --download-in-advance'

seems to have gone fine

$ salt \* grains.get oscodename
storage.qa.suse.de:
    openSUSE Leap 15.4
openqaworker2.suse.de:
    openSUSE Leap 15.4
openqaworker3.suse.de:
    openSUSE Leap 15.4
openqaworker9.suse.de:
    openSUSE Leap 15.4
openqaworker6.suse.de:
    openSUSE Leap 15.4
QA-Power8-5-kvm.qa.suse.de:
    openSUSE Leap 15.4
openqaworker5.suse.de:
    openSUSE Leap 15.4
openqaworker14.qa.suse.cz:
    openSUSE Leap 15.4
powerqaworker-qam-1.qa.suse.de:
    openSUSE Leap 15.4
openqa-monitor.qa.suse.de:
    openSUSE Leap 15.4
QA-Power8-4-kvm.qa.suse.de:
    openSUSE Leap 15.4
openqaworker13.suse.de:
    openSUSE Leap 15.4
grenache-1.qa.suse.de:
    openSUSE Leap 15.4
openqa.suse.de:
    openSUSE Leap 15.4
openqaworker-arm-2.suse.de:
    openSUSE Leap 15.4
openqaworker8.suse.de:
    openSUSE Leap 15.4
backup.qa.suse.de:
    openSUSE Leap 15.4
openqaworker10.suse.de:
    openSUSE Leap 15.4
openqaworker-arm-1.suse.de:
    openSUSE Leap 15.4

except for openqaworker-arm-3 that repeatedly crashed. Need to try harder. Triggered reboot for most workers now.

openqaworker8, openqaworker9 and openqaworker14 failed to come up yet, "maintenance" mode on openqaworker8 and 9 at least, failed "openqa_nvme_format". The actual command failing is mdadm --create /dev/md/openqa --level=0 --force --assume-clean --raid-devices=1 --run /dev/nvme0n1 with "mdadm: cannot open /dev/nvme0n1: Device or resource busy". Well, that's understandable as on openqaworker8+9 nvme0n1 has three partitions and is also used for the root filesystem hence it's already "busy". Only nvme0n1p3 should be used here.
The reason seems to be this:

├─nvme0n1p2 259:2    0   100G  0 part /var/tmp
│                                     /var/spool
│                                     /var/opt
│                                     /var/log
│                                     /var/lib/pgsql
│                                     /var/lib/named
│                                     /var/lib/mysql
│                                     /var/lib/mariadb
│                                     /var/lib/mailman
│                                     /var/lib/libvirt/images
│                                     /var/lib/machines
│                                     /var/crash
│                                     /var/cache
│                                     /usr/local
│                                     /tmp
│                                     /srv
│                                     /opt
│                                     /boot/grub2/x86_64-efi
│                                     /boot/grub2/i386-pc
│                                     /.snapshots
│                                     /
└─nvme0n1p3 259:3 bash -ex /usr/local/bin/openqa-establish-nvme-setup
# lsblk --noheadings | grep -v nvme | grep "/$"
│                                     /

maybe lsblk has changed it's format. Fixed in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/714, patched openqaworker8+9 manually.

Actions #10

Updated by okurz over 2 years ago

  • Related to action #114493: [qe-core][aarch64][installation]test fails in bootloader_start, needle mismatch on installer boot memu added
Actions #11

Updated by okurz over 2 years ago

Actions #12

Updated by okurz over 2 years ago

#114493 reported, this is also related to #111992 but on aarch64. Trying to install old package from http://download.opensuse.org/ports/aarch64/distribution/leap/15.3/repo/oss/noarch/?P=*qemu-uefi*

with

sudo zypper -n in --oldpackage http://download.opensuse.org/ports/aarch64/distribution/leap/15.3/repo/oss/noarch/qemu-uefi-aarch64-202008-10.8.1.noarch.rpm && sudo zypper al qemu-uefi-aarch64

that helped. openqaworker14 was suffering from the same "lsblk" parse problem as other machines. qa-power8-4+qa-power8-5 might still be problematic

Actions #13

Updated by okurz over 2 years ago

  • Due date deleted (2022-08-04)
  • Status changed from In Progress to Resolved

Upgrade done. Some machines fail, specific tickets created to handle

Actions #14

Updated by okurz over 1 year ago

  • Copied to action #130588: Upgrade osd workers to openSUSE Leap 15.5 added
Actions

Also available in: Atom PDF