action #99198
closedcoordination #99183: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui, to openSUSE Leap 15.3
Upgrade osd webUI host to openSUSE Leap 15.3 size:M
0%
Description
Motivation¶
- Need to upgrade machines before EOL of Leap 15.2 and have a consistent environment
Acceptance criteria¶
- AC1: osd webui host runs a clean upgraded openSUSE Leap 15.3 (no failed systemd services, no left over .rpm-new files, etc.)
Suggestions¶
- read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
- Reserve some time when the instance is only executing a few or no openQA test jobs
- After upgrade reboot and check everything working as expected
Further details¶
- If we loose access to the machine we need the help of EngineeringInfrastructure as only they have access to the VM
Updated by okurz about 3 years ago
- Copied from action #75244: Upgrade osd webUI host to openSUSE Leap 15.2 added
Updated by okurz about 3 years ago
- Subject changed from Upgrade osd webUI host to openSUSE Leap 15.2 to Upgrade osd webUI host to openSUSE Leap 15.3
- Description updated (diff)
- Assignee deleted (
livdywan) - Priority changed from High to Normal
- Start date deleted (
2020-10-24)
Updated by okurz about 3 years ago
- Priority changed from Normal to High
- Target version changed from future to Ready
Updated by livdywan about 3 years ago
- Subject changed from Upgrade osd webUI host to openSUSE Leap 15.3 to Upgrade osd webUI host to openSUSE Leap 15.3 size:M
- Status changed from New to Workable
Updated by okurz about 3 years ago
- Status changed from Workable to In Progress
- Assignee set to okurz
I guess I will upgrade OSD now. I haven't done one of the other 15.3 upgrade tasks but only supported so far so I can go with this one myself I think. https://openqa.suse.de/tests/ shows that OSD is not that busy right now. The latest state of os-autoinst+openQA has been deployed yesterday in https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/271128 . The current uptime of OSD is 11 days. Following https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
One minor change I needed to apply manually was to in /etc/zypp/repos.d/NON_Public_infrastructure.repo to replace openSUSE_Leap_$releasever
with $releasever
as the repo paths changed here.
Then continued with
zypper --releasever=$new -n ref && zypper --releasever=$new -n --no-refresh dup --auto-agree-with-licenses --replacefiles --download-in-advance --details
Due to recent events I am a bit careful regarding changes in package versions but for example I see
"perl-Mojo-IOLoop-ReadWriteProcess 0.28-lp152.2.1 -> 0.28-bp153.1.12" so that looks all good
Then I hit https://bugzilla.suse.com/show_bug.cgi?id=1192740 so applying the workaround, did
zypper --releasever=$new -n in suse-module-tools && zypper --releasever=$new -n ref && zypper --releasever=$new -n --no-refresh dup --auto-agree-with-licenses --replacefiles --download-in-advance --details
one systemd service failed, openqa-enqueue-asset-cleanup.service. Journal says
Dec 09 14:00:00 openqa systemd[10389]: openqa-enqueue-asset-cleanup.service: Failed to determine user credentials: No such process
Dec 09 14:00:00 openqa systemd[10389]: openqa-enqueue-asset-cleanup.service: Failed at step USER spawning /usr/share/openqa/script>
rebooted.
Updated by okurz about 3 years ago
- Status changed from In Progress to Resolved
Sent a notification in internal chat #eng-testing about the upgrade and the resolution.
All good after reboot. No failed services. Did sudo salt -C 'G@roles:webui' state.apply test=True
, no failures. Did sudo salt -C 'G@roles:webui' state.apply
. All good as well.
Updated by livdywan about 3 years ago
I assume this is related? It was alerting very briefly, meaning it went OK as I was about to save the comment:
openqa_minion_workers.max_max 2.400
- And again 18.56 and on-going with
2.600
. Except https://openqa.suse.de/minion/workers shows 2 workers, not none, albeit Idle. No errors visible insudo journalctl -fu openqa-gru.service
. sudo systemctl restart openqa-gru
'ed anyway and now one of the Idle minion workers is gone and not coming back.
Updated by okurz about 3 years ago
- Related to action #103771: Retry on rsync errors like "exit code 5" instead of failing the job (which then retriggers) added
Updated by okurz over 2 years ago
- Copied to action #111872: Upgrade osd webUI host to openSUSE Leap 15.4 added