coordination #99183: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui, to openSUSE Leap 15.3
Upgrade osd webUI host to openSUSE Leap 15.3 size:M
- Need to upgrade machines before EOL of Leap 15.2 and have a consistent environment
- AC1: osd webui host runs a clean upgraded openSUSE Leap 15.3 (no failed systemd services, no left over .rpm-new files, etc.)
- read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
- Reserve some time when the instance is only executing a few or no openQA test jobs
- After upgrade reboot and check everything working as expected
- If we loose access to the machine we need the help of EngineeringInfrastructure as only they have access to the VM
#8 Updated by okurz about 2 months ago
- Status changed from Workable to In Progress
- Assignee set to okurz
I guess I will upgrade OSD now. I haven't done one of the other 15.3 upgrade tasks but only supported so far so I can go with this one myself I think. https://openqa.suse.de/tests/ shows that OSD is not that busy right now. The latest state of os-autoinst+openQA has been deployed yesterday in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/271128 . The current uptime of OSD is 11 days. Following https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
One minor change I needed to apply manually was to in /etc/zypp/repos.d/NON_Public_infrastructure.repo to replace
$releasever as the repo paths changed here.
Then continued with
zypper --releasever=$new -n ref && zypper --releasever=$new -n --no-refresh dup --auto-agree-with-licenses --replacefiles --download-in-advance --details
Due to recent events I am a bit careful regarding changes in package versions but for example I see
"perl-Mojo-IOLoop-ReadWriteProcess 0.28-lp152.2.1 -> 0.28-bp153.1.12" so that looks all good
Then I hit https://bugzilla.suse.com/show_bug.cgi?id=1192740 so applying the workaround, did
zypper --releasever=$new -n in suse-module-tools && zypper --releasever=$new -n ref && zypper --releasever=$new -n --no-refresh dup --auto-agree-with-licenses --replacefiles --download-in-advance --details
one systemd service failed, openqa-enqueue-asset-cleanup.service. Journal says
Dec 09 14:00:00 openqa systemd: openqa-enqueue-asset-cleanup.service: Failed to determine user credentials: No such process Dec 09 14:00:00 openqa systemd: openqa-enqueue-asset-cleanup.service: Failed at step USER spawning /usr/share/openqa/script>
#9 Updated by okurz about 2 months ago
- Status changed from In Progress to Resolved
Sent a notification in internal chat #eng-testing about the upgrade and the resolution.
All good after reboot. No failed services. Did
sudo salt -C 'G@roles:webui' state.apply test=True, no failures. Did
sudo salt -C 'G@roles:webui' state.apply. All good as well.
#10 Updated by cdywan about 2 months ago
I assume this is related? It was alerting very briefly, meaning it went OK as I was about to save the comment:
- And again 18.56 and on-going with
2.600. Except https://openqa.suse.de/minion/workers shows 2 workers, not none, albeit Idle. No errors visible in
sudo journalctl -fu openqa-gru.service.
sudo systemctl restart openqa-gru'ed anyway and now one of the Idle minion workers is gone and not coming back.