Project

General

Profile

Actions

action #111869

closed

openQA Project (public) - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

Upgrade o3 webUI host to openSUSE Leap 15.4 size:S

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

  • Need to upgrade machines before EOL of Leap 15.3 and have a consistent environment

Acceptance criteria

  • AC1: o3 webui host runs a clean upgraded openSUSE Leap 15.4 (no failed systemd services, no left over .rpm-new files, etc.)

Suggestions

  • Read previous story https://progress.opensuse.org/issues/99195 and include all relevant details in this ticket's description, not just comments to ensure we have a good reference for the next time: cdywan already checked the comments and hasn't found anything important. tinita says that there we problems with apparmor so that should be crosschecked at least
  • read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
  • Reserve some time when the instance is only executing a few or no openQA test jobs
  • After upgrade reboot and check everything working as expected
  • Also check apparmor audit-log

Out of scope

  • Spawn a container instead of upgrading the host

Further details

  • If we loose access to the machine we need the help of EngineeringInfrastructure as only they have access to the VM

Related issues 4 (0 open4 closed)

Related to openQA Infrastructure (public) - action #114397: glibc regression causes cron to crashResolvedmkittler2022-07-20

Actions
Related to openSUSE admin - tickets #114490: Tumbleweed snapshot URL failsResolvedokurz2022-07-21

Actions
Copied from openQA Infrastructure (public) - action #99195: Upgrade o3 webUI host to openSUSE Leap 15.3 size:MResolvedlivdywan

Actions
Copied to openQA Project (public) - action #130591: Upgrade o3 webUI host to openSUSE Leap 15.5Resolvedokurz

Actions
Actions #1

Updated by okurz over 2 years ago

  • Copied from action #99195: Upgrade o3 webUI host to openSUSE Leap 15.3 size:M added
Actions #2

Updated by okurz over 2 years ago

  • Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.3 size:M to Upgrade o3 webUI host to openSUSE Leap 15.4 size:M
  • Description updated (diff)
  • Assignee deleted (livdywan)
Actions #3

Updated by okurz over 2 years ago

  • Project changed from openQA Project (public) to openQA Infrastructure (public)
  • Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.4 size:M to Upgrade o3 webUI host to openSUSE Leap 15.4
  • Target version changed from Ready to future
Actions #4

Updated by okurz over 2 years ago

  • Target version changed from future to Ready
Actions #5

Updated by livdywan over 2 years ago

  • Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.4 to Upgrade o3 webUI host to openSUSE Leap 15.4 size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #6

Updated by mkittler over 2 years ago

  • Assignee set to mkittler

I'll do it tomorrow.

Actions #7

Updated by okurz over 2 years ago

  • Related to action #114397: glibc regression causes cron to crash added
Actions #8

Updated by mkittler over 2 years ago

  • Status changed from Workable to In Progress
Actions #9

Updated by mkittler over 2 years ago

The migration is done and the system booted again and seems to work fine. I kept the locking of glibc in place because Leap 15.3 and 15.4 use literally the same glibc version (as the package is inherited).

Only two services failed: sshguard.service and zramswap.service

martchus@ariel:~> sudo systemctl status sshguard.service
× sshguard.service - SSHGUARD provides automatic attack blocking
     Loaded: loaded (/usr/lib/systemd/system/sshguard.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Wed 2022-07-20 12:57:07 UTC; 4s ago
    Process: 12381 ExecStartPre=/usr/sbin/iptables -N sshguard (code=exited, status=0/SUCCESS)
    Process: 12382 ExecStartPre=/usr/sbin/ip6tables -N sshguard (code=exited, status=0/SUCCESS)
    Process: 12383 ExecStart=/usr/sbin/sshguard -a $THRESHOLD -p $BLOCK_TIME -s $DETECTION_TIME -w $WHITELIST_FILE -b $BLACKLIST_FILE (code=exited, status=78)
    Process: 12384 ExecStopPost=/usr/sbin/iptables -F sshguard (code=exited, status=0/SUCCESS)
    Process: 12385 ExecStopPost=/usr/sbin/ip6tables -F sshguard (code=exited, status=0/SUCCESS)
    Process: 12386 ExecStopPost=/usr/sbin/iptables -X sshguard (code=exited, status=0/SUCCESS)
    Process: 12387 ExecStopPost=/usr/sbin/ip6tables -X sshguard (code=exited, status=0/SUCCESS)
   Main PID: 12383 (code=exited, status=78)
martchus@ariel:~> sudo systemctl status zramswap.service
× zramswap.service - Service enabling compressing RAM with zRam
     Loaded: loaded (/usr/lib/systemd/system/zramswap.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Wed 2022-07-20 12:56:50 UTC; 4min 11s ago
    Process: 12011 ExecStart=/usr/sbin/zramswapon (code=exited, status=255/EXCEPTION)
   Main PID: 12011 (code=exited, status=255/EXCEPTION)

Jul 20 12:56:50 ariel zramswapon[12038]: swapon: cannot open /dev/zram3: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12039]: swapon: cannot open /dev/zram4: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12040]: swapon: cannot open /dev/zram5: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12041]: swapon: cannot open /dev/zram6: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12042]: swapon: cannot open /dev/zram7: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12043]: swapon: cannot open /dev/zram8: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12044]: swapon: cannot open /dev/zram9: No such file or directory
Jul 20 12:56:50 ariel systemd[1]: zramswap.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 20 12:56:50 ariel systemd[1]: zramswap.service: Failed with result 'exit-code'.
Jul 20 12:56:50 ariel systemd[1]: Failed to start Service enabling compressing RAM with zRam.

I don't really now those but both are provided by official SLE15SP4 packages so they should have been covered by the migration to Leap 15.4.

Actions #10

Updated by openqa_review over 2 years ago

  • Due date set to 2022-08-04

Setting due date based on mean cycle time of SUSE QE Tools

Actions #11

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Feedback
Actions #12

Updated by okurz over 2 years ago

  • Due date deleted (2022-08-04)
  • Status changed from Feedback to Resolved

@mkittler I think we can make our tracking of issues a bit easier here. I subscribed to https://bugzilla.opensuse.org/show_bug.cgi?id=1193402 and you can do as well if you like. If we see the bug resolved then we can remove the workaround on o3, otherwise it will stay but IMHO we can resolve this ticket already.

Actions #13

Updated by okurz over 2 years ago

Actions #14

Updated by okurz over 1 year ago

  • Copied to action #130591: Upgrade o3 webUI host to openSUSE Leap 15.5 added
Actions

Also available in: Atom PDF