action #75238
closedcoordination #69478: [epic] Upgrade o3+osd workers+webui to openSUSE Leap 15.2
Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2
0%
Description
Motivation¶
- Need to upgrade workers before EOL of Leap 15.1 and have a consistent environment
Acceptance criteria¶
- AC1: all osd worker machines run a clean upgraded openSUSE Leap 15.2 (no failed systemd services, no left over .rpm-new files, etc.)
- AC2: openqa-monitor runs openSUSE Leap 15.2
Suggestions¶
- read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
- Reserve some time when the workers are only executing a few or no openQA test jobs
- Keep IPMI interface ready and test that Serial-over-LAN works for potential recovery
- After upgrade reboot and check everything working as expected, if not rollback, e.g. with
snapper rollback
Further details¶
Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)
for reference the upgrade to openSUSE Leap 15.1 was described #55607
Updated by okurz almost 4 years ago
- Subject changed from Upgrade osd workers to openSUSE Leap 15.2 to Upgrade osd workers and other machines, e.g. monitoring, to openSUSE Leap 15.2
Updated by okurz almost 4 years ago
- Status changed from Workable to Blocked
- Assignee set to okurz
let's wait for the corresponding o3 ticket first
Updated by okurz almost 4 years ago
- Status changed from Blocked to Workable
- Assignee deleted (
okurz)
o3 is good, this can be followed
Updated by livdywan almost 4 years ago
- Subject changed from Upgrade osd workers and other machines, e.g. monitoring, to openSUSE Leap 15.2 to Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2
- Description updated (diff)
For reference:
- I'm using
sudo salt -C 'G@roles:worker' cmd.run 'grep VERSION= /etc/os-release'
to check what workers need to be upgraded - Also
monitor.qa.suse.de
That leaves only openqa.suse.de which is covered by #75244
Updated by livdywan almost 4 years ago
- Status changed from Workable to In Progress
- DONE openqaworker2
- installed screen
- DONE openqaworker5
- Got stuck installing
dpdk-kmp-default
, good after second zypper run var-lib-openqa-share.mount loaded failed failed /var/lib/openqa/share
after reboot- Ran
sudo systemctl restart var-lib-openqa-share.mount
- Got stuck installing
- DONE openqa-monitor
- zypper succeeded on the second attempt (refreshes are racy I guess)
- DONE openqaworker6
- DONE openqaworker8
- zypper upgrade went fine.
var-lib-openqa-share.mount loaded failed failed /var/lib/openqa/share
after reboot- Ran
sudo systemctl restart var-lib-openqa-share.mount
DONE implies I checked that workers show up on https://openqa.suse.de/admin/workers and picked up jobs
Updated by livdywan almost 4 years ago
- DONE openqaworker9
- DONE openqaworker10
- Had to run
sudo systemctl restart var-lib-openqa-share.mount
- installed
htop
- not online, no jobs picked up yet
systemctl restart openqa-worker@{1..10}
to remedy #78390
- Had to run
- DONE openqaworker13
- Had to run
sudo systemctl restart var-lib-openqa-share.mount
here as well
- Had to run
- DONE QA-Power8-5-kvm.qa.suse.de
connection refused
after reboot, stuck in petitbootkexec -l /var/petitboot/mnt/dev/sda2/boot/vmlinux --initrd=/var/petitboot/mnt/dev/sda2/boot/initrd --append="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d console=tty0 console=ttyS1,115200 nospec" && kexec -e
made it as far as dracut- Tried again, this time with the right device ID (but not me, so no log of the command)
- DONE QA-Power8-4-kvm.qa.suse.de
connection refused
after reboot, stuck in petitboot,kexec load failed
kexec -l /var/petitboot/mnt/dev/sdb2/boot/vmlinux --initrd=/var/petitboot/mnt/dev/sdb2/boot/initrd --append="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d console=tty0 console=ttyS1,115200 nospec" && kexec -e
resulted in a successful boot- Installed
htop
kdump.service loaded failed failed Load kdump kernel and initrd
after reboot- #56588#note-9 talks about disabling kdump - still I had to re-disable it via
sudo systemctl disable --now kdump && sudo systemctl reset-failed
- Online, not picking up jobs yet
- DONE grenache-1
- not online, no jobs picked up yet
- see #78390
Updated by livdywan almost 4 years ago
- Related to action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs) added
Updated by livdywan almost 4 years ago
- Status changed from In Progress to Feedback
Updated by livdywan almost 4 years ago
- Status changed from Feedback to In Progress
cdywan wrote:
For reference:
- I'm using
sudo salt -C 'G@roles:worker' cmd.run 'grep VERSION= /etc/os-release'
to check what workers need to be upgraded- Also
monitor.qa.suse.de
That leaves only openqa.suse.de which is covered by #75244
@Xiaojing_liu made me aware that I missed malbec.arch.suse.de
, openqaworker-arm-1.suse.de
and openqaworker-arm-2.suse.de
, probably due to machines being down 🙄
Updated by livdywan almost 4 years ago
- Status changed from In Progress to Feedback
WIP malbec.arch.suse.de
- Stuck in petitboot after reboot
PXE autoconfiguration failed
- netboot fails with
load_kernel: /tmp/pb-2eSo7I is not a 64bit PowerPC executable
- None of the entries mentioned in #80656#note-9 are visible.
- Booted via a new entry with
/boot/vmlinux
and/boot/initrd
on sdb1 withnomodeset console=hvc console=tty
.
[FAILED] Failed to mount /var/lib/openqa/share.
[FAILED] Failed to start Load kdump kernel and initrd.
systemctl disable --now kdump && sudo systemctl reset-failed- Mounting /var/lib/openqa/share looks to have succeeded afterall.
- Worker is registered
WIP openqaworker-arm-2.suse.de
- was ready to reboot
- got unresponsive and was rebooted (by someone else?)
WIP openqaworker-arm-1.suse.de ready to reboot
Updated by livdywan almost 4 years ago
- Related to action #81046: openqaworker-arm-2.suse.de unreachable added
Updated by livdywan almost 4 years ago
- DONE malbec.arch.suse.de afterall
- DONE openqaworker-arm-2.suse.de
- See #81046
- DONE openqaworker-arm-1.suse.de
Updated by livdywan almost 4 years ago
- Related to action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly) added
Updated by livdywan almost 4 years ago
- Related to action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and others added
Updated by livdywan almost 4 years ago
- DONE
openqaworker-arm-3.suse.de
- Rebooted while there were no jobs running
Updated by livdywan almost 4 years ago
- Status changed from Blocked to In Progress
- DONE
powerqaworker-qam-1.qa.suse.de
Updated by livdywan almost 4 years ago
- Status changed from In Progress to Feedback
Updated by okurz almost 4 years ago
- Status changed from Feedback to Resolved
ssh osd "sudo salt '*' cmd.run 'grep VERSION /etc/os-release'" returns 15.2 for all machines that are currently in salt :) staging machines are left as an exercise to the next users :D Do you agree to set this to Resolved?
Updated by livdywan almost 4 years ago
okurz wrote:
ssh osd "sudo salt '*' cmd.run 'grep VERSION /etc/os-release'" returns 15.2 for all machines that are currently in salt :) staging machines are left as an exercise to the next users :D Do you agree to set this to Resolved?
Ack. I wouldn't consider staging as part of osd and this ticket for that matter. Although I might just sort those out when nobody's looking, I practically remember the steps by heart now 😂
Updated by okurz about 3 years ago
- Copied to action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M added
Updated by okurz 7 months ago
- Related to action #158041: grenache needs upgrade to 15.5 added