Project

General

Profile

Actions

action #158041

closed

grenache needs upgrade to 15.5

Added by okurz 9 months ago. Updated 9 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2024-03-26
Due date:
2024-04-09
% Done:

0%

Estimated time:

Description

Motivation

grenache-1 was offline for many months so it was not online when we upgraded our infrastructure to Leap 15.5 so grenache is still on 15.4 so we should upgrade that as well.

Acceptance criteria

  • AC1: grenache-1 runs a stable Leap 15.5
  • AC2: osd-deployment and salt states deployment and alerts are good regarding grenache-1

Suggestions

Rollback actions


Related issues 7 (0 open7 closed)

Related to openQA Infrastructure (public) - action #131309: [alert] NFS mount can fail due to hostname resolution error size:MResolvednicksinger2023-06-192023-08-11

Actions
Related to openQA Infrastructure (public) - action #127097: [alert] Failed systemd services alertResolvedmkittler2023-04-03

Actions
Related to openQA Infrastructure (public) - action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1Resolvedokurz2020-03-272021-06-11

Actions
Related to openQA Infrastructure (public) - action #75238: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2Resolvedlivdywan

Actions
Related to openQA Infrastructure (public) - action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and othersResolvedmkittler2020-10-21

Actions
Related to openQA Infrastructure (public) - action #158023: salt-states-openqa pipeline invalid arguments to state.highstate on monitor.qe.nue2.suse.orgResolvedokurz2024-03-26

Actions
Copied from openQA Infrastructure (public) - action #158020: salt-states-openqa pipeline times outResolvedokurz2024-03-26

Actions
Actions #1

Updated by okurz 9 months ago

  • Copied from action #158020: salt-states-openqa pipeline times out added
Actions #3

Updated by okurz 9 months ago

  • Status changed from New to In Progress
Actions #4

Updated by okurz 9 months ago

new_version=15.5
. /etc/os-release
zypper --releasever=$new_version --gpg-auto-import-keys ref
zypper --releasever=$new_version dup --auto-agree-with-licenses --replacefiles --download-in-advance
zypper rr python3-salt salt salt-bash-completion qemu-ovmf-x86_64 salt-minion
zypper rl -t patch openSUSE-SLE-15.4-2023-3863 openSUSE-SLE-15.4-2023-2571 openSUSE-SLE-15.4-2023-3145 openSUSE-SLE-15.4-2022-3811 openSUSE-SLE-15.4-2023-2234
zypper dup
reboot
Actions #5

Updated by okurz 9 months ago

  • Status changed from In Progress to Feedback
Actions #6

Updated by okurz 9 months ago

  • Description updated (diff)
  • Due date set to 2024-04-09

After boot

# systemctl --failed
  UNIT                       LOAD   ACTIVE SUB    DESCRIPTION          
● var-lib-openqa-share.mount loaded failed failed /var/lib/openqa/share

Does this ring a bell?

Actions #7

Updated by okurz 9 months ago

  • Related to action #131309: [alert] NFS mount can fail due to hostname resolution error size:M added
Actions #8

Updated by okurz 9 months ago

  • Related to action #127097: [alert] Failed systemd services alert added
Actions #9

Updated by okurz 9 months ago

  • Related to action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1 added
Actions #10

Updated by okurz 9 months ago

  • Related to action #75238: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2 added
Actions #11

Updated by okurz 9 months ago

  • Related to action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and others added
Actions #12

Updated by okurz 9 months ago

  • Related to action #158023: salt-states-openqa pipeline invalid arguments to state.highstate on monitor.qe.nue2.suse.org added
Actions #13

Updated by okurz 9 months ago

  • Status changed from Feedback to Resolved

After sufficient waiting time the problem solved itself it seems. Maybe our NFS recover script does that? I triggered another reboot.

After reboot

I checked /etc/fstab

openqa.suse.de:/var/lib/openqa/share        /var/lib/openqa/share   nfs noauto,nofail,retry=30,ro,x-systemd.automount,x-systemd.device-timeout=10m,x-systemd.mount-timeout=30m  0 0

Apparently the retry is handling that so that eventually the service ends up fine:

-- Boot 383e893d60a84ac482f53562da9e7f5e --
Mar 26 13:06:30 grenache-1 systemd[1]: Mounting /var/lib/openqa/share...
Mar 26 13:06:31 grenache-1 mount[2711]: mount.nfs: Network is unreachable for openqa.suse.de:/var/lib/openqa/share on /var/lib/openqa/share
Mar 26 13:06:31 grenache-1 systemd[1]: var-lib-openqa-share.mount: Mount process exited, code=exited, status=32/n/a
Mar 26 13:06:31 grenache-1 systemd[1]: var-lib-openqa-share.mount: Failed with result 'exit-code'.
Mar 26 13:06:31 grenache-1 systemd[1]: Failed to mount /var/lib/openqa/share.
Mar 26 13:13:39 grenache-1 systemd[1]: Mounting /var/lib/openqa/share...
Mar 26 13:13:39 grenache-1 mount[6751]: Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /usr/lib/systemd/system/rpc-statd.ser>
Mar 26 13:13:40 grenache-1 systemd[1]: Mounted /var/lib/openqa/share.

I guess we are good again. sudo salt 'grenache-1*' state.apply also good. Silences removed.

Actions

Also available in: Atom PDF