Project

General

Profile

action #111869

openQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

Upgrade o3 webUI host to openSUSE Leap 15.4 size:S

Added by okurz 3 months ago. Updated 27 days ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

  • Need to upgrade machines before EOL of Leap 15.3 and have a consistent environment

Acceptance criteria

  • AC1: o3 webui host runs a clean upgraded openSUSE Leap 15.4 (no failed systemd services, no left over .rpm-new files, etc.)

Suggestions

  • Read previous story https://progress.opensuse.org/issues/99195 and include all relevant details in this ticket's description, not just comments to ensure we have a good reference for the next time: cdywan already checked the comments and hasn't found anything important. tinita says that there we problems with apparmor so that should be crosschecked at least
  • read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
  • Reserve some time when the instance is only executing a few or no openQA test jobs
  • After upgrade reboot and check everything working as expected
  • Also check apparmor audit-log

Out of scope

  • Spawn a container instead of upgrading the host

Further details

  • If we loose access to the machine we need the help of EngineeringInfrastructure as only they have access to the VM

Related issues

Related to openQA Infrastructure - action #114397: glibc regression causes cron to crashResolved2022-07-20

Related to openSUSE admin - tickets #114490: Tumbleweed snapshot URL failsResolved2022-07-21

Copied from openQA Infrastructure - action #99195: Upgrade o3 webUI host to openSUSE Leap 15.3 size:MResolved

History

#1 Updated by okurz 3 months ago

  • Copied from action #99195: Upgrade o3 webUI host to openSUSE Leap 15.3 size:M added

#2 Updated by okurz 3 months ago

  • Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.3 size:M to Upgrade o3 webUI host to openSUSE Leap 15.4 size:M
  • Description updated (diff)
  • Assignee deleted (cdywan)

#3 Updated by okurz 3 months ago

  • Project changed from openQA Project to openQA Infrastructure
  • Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.4 size:M to Upgrade o3 webUI host to openSUSE Leap 15.4
  • Target version changed from Ready to future

#4 Updated by okurz about 1 month ago

  • Target version changed from future to Ready

#5 Updated by cdywan about 1 month ago

  • Subject changed from Upgrade o3 webUI host to openSUSE Leap 15.4 to Upgrade o3 webUI host to openSUSE Leap 15.4 size:S
  • Description updated (diff)
  • Status changed from New to Workable

#6 Updated by mkittler 29 days ago

  • Assignee set to mkittler

I'll do it tomorrow.

#7 Updated by okurz 28 days ago

  • Related to action #114397: glibc regression causes cron to crash added

#8 Updated by mkittler 28 days ago

  • Status changed from Workable to In Progress

#9 Updated by mkittler 28 days ago

The migration is done and the system booted again and seems to work fine. I kept the locking of glibc in place because Leap 15.3 and 15.4 use literally the same glibc version (as the package is inherited).

Only two services failed: sshguard.service and zramswap.service

martchus@ariel:~> sudo systemctl status sshguard.service
× sshguard.service - SSHGUARD provides automatic attack blocking
     Loaded: loaded (/usr/lib/systemd/system/sshguard.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Wed 2022-07-20 12:57:07 UTC; 4s ago
    Process: 12381 ExecStartPre=/usr/sbin/iptables -N sshguard (code=exited, status=0/SUCCESS)
    Process: 12382 ExecStartPre=/usr/sbin/ip6tables -N sshguard (code=exited, status=0/SUCCESS)
    Process: 12383 ExecStart=/usr/sbin/sshguard -a $THRESHOLD -p $BLOCK_TIME -s $DETECTION_TIME -w $WHITELIST_FILE -b $BLACKLIST_FILE (code=exited, status=78)
    Process: 12384 ExecStopPost=/usr/sbin/iptables -F sshguard (code=exited, status=0/SUCCESS)
    Process: 12385 ExecStopPost=/usr/sbin/ip6tables -F sshguard (code=exited, status=0/SUCCESS)
    Process: 12386 ExecStopPost=/usr/sbin/iptables -X sshguard (code=exited, status=0/SUCCESS)
    Process: 12387 ExecStopPost=/usr/sbin/ip6tables -X sshguard (code=exited, status=0/SUCCESS)
   Main PID: 12383 (code=exited, status=78)
martchus@ariel:~> sudo systemctl status zramswap.service
× zramswap.service - Service enabling compressing RAM with zRam
     Loaded: loaded (/usr/lib/systemd/system/zramswap.service; enabled; vendor preset: disabled)
     Active: failed (Result: exit-code) since Wed 2022-07-20 12:56:50 UTC; 4min 11s ago
    Process: 12011 ExecStart=/usr/sbin/zramswapon (code=exited, status=255/EXCEPTION)
   Main PID: 12011 (code=exited, status=255/EXCEPTION)

Jul 20 12:56:50 ariel zramswapon[12038]: swapon: cannot open /dev/zram3: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12039]: swapon: cannot open /dev/zram4: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12040]: swapon: cannot open /dev/zram5: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12041]: swapon: cannot open /dev/zram6: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12042]: swapon: cannot open /dev/zram7: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12043]: swapon: cannot open /dev/zram8: No such file or directory
Jul 20 12:56:50 ariel zramswapon[12044]: swapon: cannot open /dev/zram9: No such file or directory
Jul 20 12:56:50 ariel systemd[1]: zramswap.service: Main process exited, code=exited, status=255/EXCEPTION
Jul 20 12:56:50 ariel systemd[1]: zramswap.service: Failed with result 'exit-code'.
Jul 20 12:56:50 ariel systemd[1]: Failed to start Service enabling compressing RAM with zRam.

I don't really now those but both are provided by official SLE15SP4 packages so they should have been covered by the migration to Leap 15.4.

#10 Updated by openqa_review 28 days ago

  • Due date set to 2022-08-04

Setting due date based on mean cycle time of SUSE QE Tools

#11 Updated by mkittler 27 days ago

  • Status changed from In Progress to Feedback

#12 Updated by okurz 27 days ago

  • Due date deleted (2022-08-04)
  • Status changed from Feedback to Resolved

mkittler I think we can make our tracking of issues a bit easier here. I subscribed to https://bugzilla.opensuse.org/show_bug.cgi?id=1193402 and you can do as well if you like. If we see the bug resolved then we can remove the workaround on o3, otherwise it will stay but IMHO we can resolve this ticket already.

#13 Updated by okurz 24 days ago

Also available in: Atom PDF