action #111866
closedopenQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4
Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4
0%
Description
Motivation¶
- Need to upgrade workers before EOL of Leap 15.3 and have a consistent environment
Acceptance criteria¶
- AC1: all osd worker machines run a clean upgraded openSUSE Leap 15.4 (no failed systemd services, no left over .rpm-new files, etc.)
- AC2: openqa-monitor runs openSUSE Leap 15.4
Suggestions¶
- read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
- Reserve some time when the workers are only executing a few or no openQA test jobs
- Keep IPMI interface ready and test that Serial-over-LAN works for potential recovery
- After upgrade reboot and check everything working as expected, if not rollback, e.g. with
snapper rollback
Further details¶
- Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)
Updated by okurz over 2 years ago
- Copied from action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M added
Updated by okurz over 2 years ago
- Subject changed from Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M to Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4 size:M
- Description updated (diff)
- Assignee deleted (
livdywan) - Priority changed from High to Normal
- Target version changed from Ready to future
Updated by okurz over 2 years ago
- Project changed from openQA Project to openQA Infrastructure
- Subject changed from Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4 size:M to Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4
Updated by okurz over 2 years ago
- Related to action #111992: Deal with QEMU and OVMF default resolution being 1280x800, affecting (at least) qxl size:M added
Updated by okurz over 2 years ago
- Status changed from New to Blocked
- Assignee set to okurz
- Target version changed from future to Ready
Updated by punkioudi over 2 years ago
- Related to action #108548: [sle][security][backlog]automation: Integrate 'secure-boot' on Power into openQA added
Updated by okurz over 2 years ago
@punkioudi I wonder, where do you see a relation to #108548? What's your expectation?
Updated by okurz over 2 years ago
- Due date set to 2022-08-04
Upgrading openqaworker11 and openqaworker12 manually as they are currently not in salt. openqaworker12 is still a Leap 15.2 so I will do a Leap15.2->15.4 direct upgrade for fun and because I am curious what happens.
Many repositories have changed the name format from "openSUSE_Leap_$releasever" to "$releasever" so we need to adapt for that.
Actually by now there are more machines controlled by our salt structure so let's upgrade them along the way
powerqaworker-qam-1.qa.suse.de has some conflicts about python2 packages, manually removed first.
sudo salt --no-color --state-output=changes -C 'powerqaworker-qam-1.qa.suse.de' cmd.run 'zypper -n rm -u python2-libxml2-python'
then all the machines:
sudo salt --no-color --state-output=changes -C 'not G@roles:webui' cmd.run '(rpm -q qemu-ovmf-x86_64 && zypper al qemu-ovmf-x86_64) ; zypper rr telegraf-monitoring && sed -i -e "s@/openSUSE_Leap_@/@g" /etc/zypp/repos.d/* && zypper -n --releasever=15.4 ref && zypper -n --releasever=15.4 dup --auto-agree-with-licenses --replacefiles --download-in-advance'
seems to have gone fine
$ salt \* grains.get oscodename
storage.qa.suse.de:
openSUSE Leap 15.4
openqaworker2.suse.de:
openSUSE Leap 15.4
openqaworker3.suse.de:
openSUSE Leap 15.4
openqaworker9.suse.de:
openSUSE Leap 15.4
openqaworker6.suse.de:
openSUSE Leap 15.4
QA-Power8-5-kvm.qa.suse.de:
openSUSE Leap 15.4
openqaworker5.suse.de:
openSUSE Leap 15.4
openqaworker14.qa.suse.cz:
openSUSE Leap 15.4
powerqaworker-qam-1.qa.suse.de:
openSUSE Leap 15.4
openqa-monitor.qa.suse.de:
openSUSE Leap 15.4
QA-Power8-4-kvm.qa.suse.de:
openSUSE Leap 15.4
openqaworker13.suse.de:
openSUSE Leap 15.4
grenache-1.qa.suse.de:
openSUSE Leap 15.4
openqa.suse.de:
openSUSE Leap 15.4
openqaworker-arm-2.suse.de:
openSUSE Leap 15.4
openqaworker8.suse.de:
openSUSE Leap 15.4
backup.qa.suse.de:
openSUSE Leap 15.4
openqaworker10.suse.de:
openSUSE Leap 15.4
openqaworker-arm-1.suse.de:
openSUSE Leap 15.4
except for openqaworker-arm-3 that repeatedly crashed. Need to try harder. Triggered reboot for most workers now.
openqaworker8, openqaworker9 and openqaworker14 failed to come up yet, "maintenance" mode on openqaworker8 and 9 at least, failed "openqa_nvme_format". The actual command failing is mdadm --create /dev/md/openqa --level=0 --force --assume-clean --raid-devices=1 --run /dev/nvme0n1
with "mdadm: cannot open /dev/nvme0n1: Device or resource busy". Well, that's understandable as on openqaworker8+9 nvme0n1 has three partitions and is also used for the root filesystem hence it's already "busy". Only nvme0n1p3 should be used here.
The reason seems to be this:
├─nvme0n1p2 259:2 0 100G 0 part /var/tmp
│ /var/spool
│ /var/opt
│ /var/log
│ /var/lib/pgsql
│ /var/lib/named
│ /var/lib/mysql
│ /var/lib/mariadb
│ /var/lib/mailman
│ /var/lib/libvirt/images
│ /var/lib/machines
│ /var/crash
│ /var/cache
│ /usr/local
│ /tmp
│ /srv
│ /opt
│ /boot/grub2/x86_64-efi
│ /boot/grub2/i386-pc
│ /.snapshots
│ /
└─nvme0n1p3 259:3 bash -ex /usr/local/bin/openqa-establish-nvme-setup
# lsblk --noheadings | grep -v nvme | grep "/$"
│ /
maybe lsblk has changed it's format. Fixed in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/714, patched openqaworker8+9 manually.
Updated by okurz over 2 years ago
- Related to action #114493: [qe-core][aarch64][installation]test fails in bootloader_start, needle mismatch on installer boot memu added
Updated by okurz over 2 years ago
- Copied to action #114526: recover openqaworker14 added
Updated by okurz over 2 years ago
#114493 reported, this is also related to #111992 but on aarch64. Trying to install old package from http://download.opensuse.org/ports/aarch64/distribution/leap/15.3/repo/oss/noarch/?P=*qemu-uefi*
with
sudo zypper -n in --oldpackage http://download.opensuse.org/ports/aarch64/distribution/leap/15.3/repo/oss/noarch/qemu-uefi-aarch64-202008-10.8.1.noarch.rpm && sudo zypper al qemu-uefi-aarch64
that helped. openqaworker14 was suffering from the same "lsblk" parse problem as other machines. qa-power8-4+qa-power8-5 might still be problematic
Updated by okurz over 2 years ago
- Due date deleted (
2022-08-04) - Status changed from In Progress to Resolved
Upgrade done. Some machines fail, specific tickets created to handle
Updated by okurz over 1 year ago
- Copied to action #130588: Upgrade osd workers to openSUSE Leap 15.5 added