action #111869
closedopenQA Project (public) - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4
Upgrade o3 webUI host to openSUSE Leap 15.4 size:S
0%
Description
Motivation¶
- Need to upgrade machines before EOL of Leap 15.3 and have a consistent environment
Acceptance criteria¶
- AC1: o3 webui host runs a clean upgraded openSUSE Leap 15.4 (no failed systemd services, no left over .rpm-new files, etc.)
Suggestions¶
- Read previous story https://progress.opensuse.org/issues/99195 and include all relevant details in this ticket's description, not just comments to ensure we have a good reference for the next time: cdywan already checked the comments and hasn't found anything important. tinita says that there we problems with apparmor so that should be crosschecked at least
- read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
- Reserve some time when the instance is only executing a few or no openQA test jobs
- After upgrade reboot and check everything working as expected
- Also check apparmor audit-log
Out of scope¶
- Spawn a container instead of upgrading the host
Further details¶
- If we loose access to the machine we need the help of EngineeringInfrastructure as only they have access to the VM
Updated by okurz over 2 years ago
- Copied from action #99195: Upgrade o3 webUI host to openSUSE Leap 15.3 size:M added
Updated by okurz over 2 years ago
- Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.3 size:M to Upgrade o3 webUI host to openSUSE Leap 15.4 size:M
- Description updated (diff)
- Assignee deleted (
livdywan)
Updated by okurz over 2 years ago
- Project changed from openQA Project (public) to openQA Infrastructure (public)
- Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.4 size:M to Upgrade o3 webUI host to openSUSE Leap 15.4
- Target version changed from Ready to future
Updated by livdywan over 2 years ago
- Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.4 to Upgrade o3 webUI host to openSUSE Leap 15.4 size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz over 2 years ago
- Related to action #114397: glibc regression causes cron to crash added
Updated by mkittler over 2 years ago
- Status changed from Workable to In Progress
Updated by mkittler over 2 years ago
The migration is done and the system booted again and seems to work fine. I kept the locking of glibc in place because Leap 15.3 and 15.4 use literally the same glibc version (as the package is inherited).
Only two services failed: sshguard.service and zramswap.service
martchus@ariel:~> sudo systemctl status sshguard.service
× sshguard.service - SSHGUARD provides automatic attack blocking
Loaded: loaded (/usr/lib/systemd/system/sshguard.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2022-07-20 12:57:07 UTC; 4s ago
Process: 12381 ExecStartPre=/usr/sbin/iptables -N sshguard (code=exited, status=0/SUCCESS)
Process: 12382 ExecStartPre=/usr/sbin/ip6tables -N sshguard (code=exited, status=0/SUCCESS)
Process: 12383 ExecStart=/usr/sbin/sshguard -a $THRESHOLD -p $BLOCK_TIME -s $DETECTION_TIME -w $WHITELIST_FILE -b $BLACKLIST_FILE (code=exited, status=78)
Process: 12384 ExecStopPost=/usr/sbin/iptables -F sshguard (code=exited, status=0/SUCCESS)
Process: 12385 ExecStopPost=/usr/sbin/ip6tables -F sshguard (code=exited, status=0/SUCCESS)
Process: 12386 ExecStopPost=/usr/sbin/iptables -X sshguard (code=exited, status=0/SUCCESS)
Process: 12387 ExecStopPost=/usr/sbin/ip6tables -X sshguard (code=exited, status=0/SUCCESS)
Main PID: 12383 (code=exited, status=78)
martchus@ariel:~> sudo systemctl status zramswap.service
× zramswap.service - Service enabling compressing RAM with zRam
Loaded: loaded (/usr/lib/systemd/system/zramswap.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Wed 2022-07-20 12:56:50 UTC; 4min 11s ago
Process: 12011 ExecStart=/usr/sbin/zramswapon (code=exited, status=255/EXCEPTION)
Main PID: 12011 (code=exited, status=255/EXCEPTION)
Jul 20 12:56:50 ariel zramswapon[12038]: swapon: cannot open /dev/zram3: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12039]: swapon: cannot open /dev/zram4: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12040]: swapon: cannot open /dev/zram5: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12041]: swapon: cannot open /dev/zram6: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12042]: swapon: cannot open /dev/zram7: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12043]: swapon: cannot open /dev/zram8: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12044]: swapon: cannot open /dev/zram9: No such file or directory
Jul 20 12:56:50 ariel systemd[1]: zramswap.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 20 12:56:50 ariel systemd[1]: zramswap.service: Failed with result 'exit-code'.
Jul 20 12:56:50 ariel systemd[1]: Failed to start Service enabling compressing RAM with zRam.
I don't really now those but both are provided by official SLE15SP4 packages so they should have been covered by the migration to Leap 15.4.
Updated by openqa_review over 2 years ago
- Due date set to 2022-08-04
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler over 2 years ago
- Status changed from In Progress to Feedback
- zram service
- Pending SR: https://build.opensuse.org/request/show/989569
- Manual override in place for the time being (
/etc/systemd/system/zramswap.service
)
- sshguard service
- Fixed by fixing the
BACKEND
-path in/etc/sshguard.conf
- Fixed by fixing the
Updated by okurz over 2 years ago
- Due date deleted (
2022-08-04) - Status changed from Feedback to Resolved
@mkittler I think we can make our tracking of issues a bit easier here. I subscribed to https://bugzilla.opensuse.org/show_bug.cgi?id=1193402 and you can do as well if you like. If we see the bug resolved then we can remove the workaround on o3, otherwise it will stay but IMHO we can resolve this ticket already.
Updated by okurz over 2 years ago
- Related to tickets #114490: Tumbleweed snapshot URL fails added
Updated by okurz over 1 year ago
- Copied to action #130591: Upgrade o3 webUI host to openSUSE Leap 15.5 added