action #130585
closedcoordination #130582: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.5
Upgrade o3 workers to openSUSE Leap 15.5
0%
Description
Motivation¶
- Need to upgrade workers before EOL of Leap 15.3 and have a consistent environment
Acceptance criteria¶
- AC1: all o3 worker machines run a clean upgraded openSUSE Leap 15.4 (no failed systemd services, no left over .rpm-new files, etc.)
Suggestions¶
- read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
- Reserve some time when the workers are only executing a few or no openQA test jobs
- Keep IPMI interface ready and test that Serial-over-LAN works for potential recovery
- Use the instructions from above but use
transactional-update shell
for transactional update workers - After upgrade reboot and check everything working as expected, if not rollback, e.g. with
transactional-update rollback
Further details¶
- Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)
Updated by okurz over 1 year ago
- Copied from action #111863: Upgrade o3 workers to openSUSE Leap 15.4 size:M added
Updated by okurz over 1 year ago
- Subject changed from Upgrade o3 workers to openSUSE Leap 15.4 size:M to Upgrade o3 workers to openSUSE Leap 15.5
- Category set to Organisational
- Assignee deleted (
tinita) - Target version changed from Ready to future
Updated by okurz over 1 year ago
- Target version changed from future to Tools - Next
Updated by okurz over 1 year ago
hosts="aarch64 openqaworker4 openqaworker6 openqaworker7 openqaworker19 openqaworker20 openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 qa-power8-3 rebel"; for i in $hosts; do echo $i && ssh -t root@$i 'grep -q VERSION=.*15.4 /etc/os-release && zypper -n --no-refresh in screen; old=15.4; new=15.5; screen -L -S upgrade sh -c "new=$new; zypper -n up --auto-agree-with-licenses --replacefiles && zypper --releasever=$new -n --gpg-auto-import-keys ref && zypper --releasever=$new -n --no-refresh dup --auto-agree-with-licenses --replacefiles --download-in-advance $interactive"' ; done
and I found inconsistent repositories on openqaworker4 like "devel:openQA" plus "devel_openQA". I tried to rectify this situation but this broke more over night
https://suse.slack.com/archives/C02CANHLANP/p1694842897320189
(Dominique Leuenberger) Good morning! ow4 seems to have trouble testing anything. Am the end up incomplete with this error: Reason: backend died: Can't locate object method "spew" via package "Mojo::File" at /usr/lib/perl5/vendor_perl/5.26.1/Mojo/IOLoop/ReadWriteProcess.pm line 412.
In case anybody catches a minute, a fix would be appreciated
(Oliver Kurz) on it […] sorry, was my fault. I was preparing workers for Leap 15.5 upgrade and found duplicate repos on w4, removed those and that apparently caused different repo priority combinations as the existing non-dupplicate ones were not correctly configured.
worker=openqaworker4 openqa-advanced-retrigger-jobs
Re-executed the above command multiple times and now all machines are on Leap 15.5 with exception of aarch64+qa-power8-3 which are both currently in transition, waiting for #134864 and #132140 accordingly.
To ensure a consistent updated state for all machines I checked
hosts="aarch64 openqaworker4 openqaworker6 openqaworker7 openqaworker19 openqaworker20 openqaworker21 openqaworker22 openqaworker23 openqaworker24 openqaworker25 openqaworker26 openqaworker27 openqaworker28 qa-power8-3 rebel"; for i in $hosts; do echo $i && ssh -t root@$i 'systemctl is-enabled openqa-{auto,continuous}-update.timer' ; done
aarch64
ssh: connect to host aarch64 port 22: No route to host
openqaworker4
enabled
enabled
Connection to openqaworker4 closed.
openqaworker6
enabled
enabled
Connection to openqaworker6 closed.
openqaworker7
enabled
enabled
Connection to openqaworker7 closed.
openqaworker19
enabled
enabled
Connection to openqaworker19 closed.
openqaworker20
enabled
enabled
Connection to openqaworker20 closed.
openqaworker21
enabled
enabled
Connection to openqaworker21 closed.
openqaworker22
enabled
enabled
Connection to openqaworker22 closed.
openqaworker23
enabled
enabled
Connection to openqaworker23 closed.
openqaworker24
enabled
enabled
Connection to openqaworker24 closed.
openqaworker25
disabled
disabled
Connection to openqaworker25 closed.
openqaworker26
Failed to get unit file state for openqa-auto-update.timer: No such file or directory
Connection to openqaworker26 closed.
openqaworker27
Failed to get unit file state for openqa-auto-update.timer: No such file or directory
Connection to openqaworker27 closed.
openqaworker28
enabled
enabled
Connection to openqaworker28 closed.
qa-power8-3
ssh: connect to host qa-power8-3 port 22: No route to host
rebel
enabled
enabled
Connection to rebel closed.
so w25 has the timers disabled, w26+w27 not installed at all but we targeted those as bare-metal test machines accordingly. So now also blocking on #132134
Updated by okurz over 1 year ago
- Related to action #132134: Setup new PRG2 multi-machine openQA worker for o3 size:M added
Updated by okurz over 1 year ago
- Tags set to infra
- Status changed from New to Blocked
- Assignee set to okurz
Updated by okurz about 1 year ago
- Status changed from Blocked to New
- Assignee deleted (
okurz)
#132134 resolved, unblocked
Updated by okurz about 1 year ago
- Target version changed from Tools - Next to Ready
Updated by okurz about 1 year ago
- Target version changed from Ready to Tools - Next
Updated by okurz about 1 year ago
- Status changed from New to Resolved
- Assignee set to okurz
- Target version changed from Tools - Next to Ready
okurz@new-ariel:~> for i in $hosts; do echo $i && ssh root@$i "grep VERSION /etc/os-release"; done
aarch64-o3
ssh: Could not resolve hostname aarch64-o3: Name or service not known
kerosene
ssh: Could not resolve hostname kerosene: Name or service not known
openqaworker20
ssh: connect to host openqaworker20 port 22: No route to host
openqaworker21
VERSION="15.5"
VERSION_ID="15.5"
openqaworker22
VERSION="15.5"
VERSION_ID="15.5"
openqaworker23
VERSION="15.5"
VERSION_ID="15.5"
openqaworker24
VERSION="15.5"
VERSION_ID="15.5"
openqaworker25
VERSION="15.5"
VERSION_ID="15.5"
openqaworker26
VERSION="15.5"
VERSION_ID="15.5"
openqaworker27
VERSION="15.5"
VERSION_ID="15.5"
openqaworker28
VERSION="15.5"
VERSION_ID="15.5"
openqaworker-arm21
ssh: connect to host openqaworker-arm21 port 22: No route to host
openqaworker-arm22
VERSION="15.5"
VERSION_ID="15.5"
qa-power8-3
ssh: connect to host qa-power8-3 port 22: No route to host
Updated by okurz 9 months ago
- Copied to action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added