Actions
action #158041
closedgrenache needs upgrade to 15.5
Start date:
2024-03-26
Due date:
2024-04-09
% Done:
0%
Estimated time:
Tags:
Description
Motivation¶
grenache-1 was offline for many months so it was not online when we upgraded our infrastructure to Leap 15.5 so grenache is still on 15.4 so we should upgrade that as well.
Acceptance criteria¶
- AC1: grenache-1 runs a stable Leap 15.5
- AC2: osd-deployment and salt states deployment and alerts are good regarding grenache-1
Suggestions¶
- Conduct the distribution upgrade according to https://progress.opensuse.org/projects/openqav3/wiki/#Distribution-upgrades
- Apply according necessary package locks
- Remove obsolete package locks
- Ensure system is fully upgraded
- Try multiple reboots
- Ensure that there are no related alerts
Rollback actions¶
- Remove silence "alertname=Failed systemd services alert (except openqa.suse.de)" from https://monitor.qa.suse.de/alerting/silences
- Remove silence "alertname=grenache-1: host up alert" from https://monitor.qa.suse.de/alerting/silences
Updated by okurz 9 months ago
- Copied from action #158020: salt-states-openqa pipeline times out added
Updated by okurz 9 months ago
new_version=15.5
. /etc/os-release
zypper --releasever=$new_version --gpg-auto-import-keys ref
zypper --releasever=$new_version dup --auto-agree-with-licenses --replacefiles --download-in-advance
zypper rr python3-salt salt salt-bash-completion qemu-ovmf-x86_64 salt-minion
zypper rl -t patch openSUSE-SLE-15.4-2023-3863 openSUSE-SLE-15.4-2023-2571 openSUSE-SLE-15.4-2023-3145 openSUSE-SLE-15.4-2022-3811 openSUSE-SLE-15.4-2023-2234
zypper dup
reboot
Updated by okurz 9 months ago
- Related to action #131309: [alert] NFS mount can fail due to hostname resolution error size:M added
Updated by okurz 9 months ago
- Related to action #127097: [alert] Failed systemd services alert added
Updated by okurz 9 months ago
- Related to action #64941: after every reboot openqaworker7 is missing var-lib-openqa-share.mount , check dependencies of service with openqaworker1 added
Updated by okurz 9 months ago
- Related to action #75238: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2 added
Updated by okurz 9 months ago
- Related to action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and others added
Updated by okurz 9 months ago
- Related to action #158023: salt-states-openqa pipeline invalid arguments to state.highstate on monitor.qe.nue2.suse.org added
Updated by okurz 9 months ago
- Status changed from Feedback to Resolved
After sufficient waiting time the problem solved itself it seems. Maybe our NFS recover script does that? I triggered another reboot.
After reboot
I checked /etc/fstab
openqa.suse.de:/var/lib/openqa/share /var/lib/openqa/share nfs noauto,nofail,retry=30,ro,x-systemd.automount,x-systemd.device-timeout=10m,x-systemd.mount-timeout=30m 0 0
Apparently the retry is handling that so that eventually the service ends up fine:
-- Boot 383e893d60a84ac482f53562da9e7f5e --
Mar 26 13:06:30 grenache-1 systemd[1]: Mounting /var/lib/openqa/share...
Mar 26 13:06:31 grenache-1 mount[2711]: mount.nfs: Network is unreachable for openqa.suse.de:/var/lib/openqa/share on /var/lib/openqa/share
Mar 26 13:06:31 grenache-1 systemd[1]: var-lib-openqa-share.mount: Mount process exited, code=exited, status=32/n/a
Mar 26 13:06:31 grenache-1 systemd[1]: var-lib-openqa-share.mount: Failed with result 'exit-code'.
Mar 26 13:06:31 grenache-1 systemd[1]: Failed to mount /var/lib/openqa/share.
Mar 26 13:13:39 grenache-1 systemd[1]: Mounting /var/lib/openqa/share...
Mar 26 13:13:39 grenache-1 mount[6751]: Created symlink /run/systemd/system/remote-fs.target.wants/rpc-statd.service → /usr/lib/systemd/system/rpc-statd.ser>
Mar 26 13:13:40 grenache-1 systemd[1]: Mounted /var/lib/openqa/share.
I guess we are good again. sudo salt 'grenache-1*' state.apply
also good. Silences removed.
Actions