Project

General

Profile

Actions

action #75238

closed

coordination #69478: [epic] Upgrade o3+osd workers+webui to openSUSE Leap 15.2

Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2

Added by okurz over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

  • Need to upgrade workers before EOL of Leap 15.1 and have a consistent environment

Acceptance criteria

  • AC1: all osd worker machines run a clean upgraded openSUSE Leap 15.2 (no failed systemd services, no left over .rpm-new files, etc.)
  • AC2: openqa-monitor runs openSUSE Leap 15.2

Suggestions

Further details

  • Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)

  • for reference the upgrade to openSUSE Leap 15.1 was described #55607


Related issues 6 (0 open6 closed)

Related to openQA Project - action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs)Resolvedmkittler2021-01-18

Actions
Related to openQA Infrastructure - action #81046: openqaworker-arm-2.suse.de unreachableResolvedlivdywan2020-12-15

Actions
Related to openQA Infrastructure - action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly)Resolvedokurz2020-06-14

Actions
Related to openQA Infrastructure - action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and othersResolvedmkittler2020-10-21

Actions
Related to openQA Infrastructure - action #158041: grenache needs upgrade to 15.5Resolvedokurz2024-03-262024-04-09

Actions
Copied to openQA Infrastructure - action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:MResolvedlivdywan

Actions
Actions #1

Updated by okurz over 3 years ago

  • Subject changed from Upgrade osd workers to openSUSE Leap 15.2 to Upgrade osd workers and other machines, e.g. monitoring, to openSUSE Leap 15.2
Actions #2

Updated by okurz over 3 years ago

  • Status changed from New to Workable
Actions #3

Updated by okurz over 3 years ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz

let's wait for the corresponding o3 ticket first

Actions #4

Updated by okurz over 3 years ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

o3 is good, this can be followed

Actions #5

Updated by okurz over 3 years ago

  • Description updated (diff)
Actions #6

Updated by livdywan over 3 years ago

  • Assignee set to livdywan
Actions #7

Updated by livdywan over 3 years ago

  • Subject changed from Upgrade osd workers and other machines, e.g. monitoring, to openSUSE Leap 15.2 to Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2
  • Description updated (diff)

For reference:

  • I'm using sudo salt -C 'G@roles:worker' cmd.run 'grep VERSION= /etc/os-release' to check what workers need to be upgraded
  • Also monitor.qa.suse.de

That leaves only openqa.suse.de which is covered by #75244

Actions #8

Updated by livdywan over 3 years ago

  • Status changed from Workable to In Progress
  • DONE openqaworker2
    • installed screen
  • DONE openqaworker5
    • Got stuck installing dpdk-kmp-default, good after second zypper run
    • var-lib-openqa-share.mount loaded failed failed /var/lib/openqa/share after reboot
    • Ran sudo systemctl restart var-lib-openqa-share.mount
  • DONE openqa-monitor
    • zypper succeeded on the second attempt (refreshes are racy I guess)
  • DONE openqaworker6
  • DONE openqaworker8
    • zypper upgrade went fine.
    • var-lib-openqa-share.mount loaded failed failed /var/lib/openqa/share after reboot
    • Ran sudo systemctl restart var-lib-openqa-share.mount

DONE implies I checked that workers show up on https://openqa.suse.de/admin/workers and picked up jobs

Actions #9

Updated by livdywan over 3 years ago

  • DONE openqaworker9
  • DONE openqaworker10
    • Had to run sudo systemctl restart var-lib-openqa-share.mount
    • installed htop
    • not online, no jobs picked up yet
    • systemctl restart openqa-worker@{1..10} to remedy #78390
  • DONE openqaworker13
    • Had to run sudo systemctl restart var-lib-openqa-share.mount here as well
  • DONE QA-Power8-5-kvm.qa.suse.de
    • connection refused after reboot, stuck in petitboot
    • kexec -l /var/petitboot/mnt/dev/sda2/boot/vmlinux --initrd=/var/petitboot/mnt/dev/sda2/boot/initrd --append="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d console=tty0 console=ttyS1,115200 nospec" && kexec -e made it as far as dracut
    • Tried again, this time with the right device ID (but not me, so no log of the command)
  • DONE QA-Power8-4-kvm.qa.suse.de
    • connection refused after reboot, stuck in petitboot, kexec load failed
    • kexec -l /var/petitboot/mnt/dev/sdb2/boot/vmlinux --initrd=/var/petitboot/mnt/dev/sdb2/boot/initrd --append="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d console=tty0 console=ttyS1,115200 nospec" && kexec -e resulted in a successful boot
    • Installed htop
    • kdump.service loaded failed failed Load kdump kernel and initrd after reboot
    • #56588#note-9 talks about disabling kdump - still I had to re-disable it via sudo systemctl disable --now kdump && sudo systemctl reset-failed
    • Online, not picking up jobs yet
  • DONE grenache-1
    • not online, no jobs picked up yet
    • see #78390
Actions #10

Updated by livdywan over 3 years ago

  • Related to action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs) added
Actions #11

Updated by livdywan over 3 years ago

  • Status changed from In Progress to Feedback
Actions #12

Updated by livdywan over 3 years ago

  • Status changed from Feedback to In Progress

cdywan wrote:

For reference:

  • I'm using sudo salt -C 'G@roles:worker' cmd.run 'grep VERSION= /etc/os-release' to check what workers need to be upgraded
  • Also monitor.qa.suse.de

That leaves only openqa.suse.de which is covered by #75244

@Xiaojing_liu made me aware that I missed malbec.arch.suse.de, openqaworker-arm-1.suse.de and openqaworker-arm-2.suse.de, probably due to machines being down 🙄

Actions #13

Updated by livdywan over 3 years ago

  • Status changed from In Progress to Feedback
  • WIP malbec.arch.suse.de

    • Stuck in petitboot after reboot
    • PXE autoconfiguration failed
    • netboot fails with load_kernel: /tmp/pb-2eSo7I is not a 64bit PowerPC executable
    • None of the entries mentioned in #80656#note-9 are visible.
    • Booted via a new entry with /boot/vmlinux and /boot/initrd on sdb1 with nomodeset console=hvc console=tty.

    [FAILED] Failed to mount /var/lib/openqa/share.
    [FAILED] Failed to start Load kdump kernel and initrd.
    systemctl disable --now kdump && sudo systemctl reset-failed

    • Mounting /var/lib/openqa/share looks to have succeeded afterall.
    • Worker is registered
  • WIP openqaworker-arm-2.suse.de

    • was ready to reboot
    • got unresponsive and was rebooted (by someone else?)
  • WIP openqaworker-arm-1.suse.de ready to reboot

Actions #14

Updated by livdywan over 3 years ago

  • Related to action #81046: openqaworker-arm-2.suse.de unreachable added
Actions #15

Updated by livdywan over 3 years ago

  • DONE malbec.arch.suse.de afterall
  • DONE openqaworker-arm-2.suse.de
  • DONE openqaworker-arm-1.suse.de
Actions #16

Updated by livdywan over 3 years ago

Still open:

  • openqaworker-arm-3.suse.de, see #75016
  • powerqaworker-qam-1, see #68053
Actions #17

Updated by livdywan over 3 years ago

  • Related to action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly) added
Actions #18

Updated by livdywan over 3 years ago

  • Related to action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and others added
Actions #19

Updated by livdywan over 3 years ago

  • Status changed from Feedback to Blocked
Actions #20

Updated by livdywan over 3 years ago

  • DONE openqaworker-arm-3.suse.de
    • Rebooted while there were no jobs running
Actions #21

Updated by livdywan over 3 years ago

  • Status changed from Blocked to In Progress
  • DONE powerqaworker-qam-1.qa.suse.de
Actions #22

Updated by livdywan over 3 years ago

  • Status changed from In Progress to Feedback
Actions #23

Updated by okurz over 3 years ago

  • Status changed from Feedback to Resolved

ssh osd "sudo salt '*' cmd.run 'grep VERSION /etc/os-release'" returns 15.2 for all machines that are currently in salt :) staging machines are left as an exercise to the next users :D Do you agree to set this to Resolved?

Actions #24

Updated by livdywan over 3 years ago

okurz wrote:

ssh osd "sudo salt '*' cmd.run 'grep VERSION /etc/os-release'" returns 15.2 for all machines that are currently in salt :) staging machines are left as an exercise to the next users :D Do you agree to set this to Resolved?

Ack. I wouldn't consider staging as part of osd and this ticket for that matter. Although I might just sort those out when nobody's looking, I practically remember the steps by heart now 😂

Actions #25

Updated by okurz over 2 years ago

  • Copied to action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M added
Actions #27

Updated by okurz 30 days ago

Actions

Also available in: Atom PDF