Project

General

Profile

action #75238

coordination #69478: [epic] Upgrade o3+osd workers+webui to openSUSE Leap 15.2

Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2

Added by okurz about 1 year ago. Updated 12 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

  • Need to upgrade workers before EOL of Leap 15.1 and have a consistent environment

Acceptance criteria

  • AC1: all osd worker machines run a clean upgraded openSUSE Leap 15.2 (no failed systemd services, no left over .rpm-new files, etc.)
  • AC2: openqa-monitor runs openSUSE Leap 15.2

Suggestions

Further details

  • Don't worry, everything can be repaired :) If by any chance the worker gets misconfigured there are btrfs snapshots to recover, the IPMI Serial-over-LAN, a reinstall is possible and not hard, there is no important data on the host (it's only an openQA worker) and there are also other machines that can jobs while one host might be down for a little bit longer. And okurz can hold your hand :)

  • for reference the upgrade to openSUSE Leap 15.1 was described #55607


Related issues

Related to openQA Project - action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs)Resolved2021-01-18

Related to openQA Infrastructure - action #81046: openqaworker-arm-2.suse.de unreachableResolved2020-12-15

Related to openQA Infrastructure - action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly)Resolved2020-06-14

Related to openQA Infrastructure - action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and othersResolved2020-10-21

Copied to openQA Infrastructure - action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:MWorkable

History

#1 Updated by okurz about 1 year ago

  • Subject changed from Upgrade osd workers to openSUSE Leap 15.2 to Upgrade osd workers and other machines, e.g. monitoring, to openSUSE Leap 15.2

#2 Updated by okurz about 1 year ago

  • Status changed from New to Workable

#3 Updated by okurz about 1 year ago

  • Status changed from Workable to Blocked
  • Assignee set to okurz

let's wait for the corresponding o3 ticket first

#4 Updated by okurz about 1 year ago

  • Status changed from Blocked to Workable
  • Assignee deleted (okurz)

o3 is good, this can be followed

#5 Updated by okurz about 1 year ago

  • Description updated (diff)

#6 Updated by cdywan 12 months ago

  • Assignee set to cdywan

#7 Updated by cdywan 12 months ago

  • Subject changed from Upgrade osd workers and other machines, e.g. monitoring, to openSUSE Leap 15.2 to Upgrade osd workers and openqa-monitor to openSUSE Leap 15.2
  • Description updated (diff)

For reference:

  • I'm using sudo salt -C 'G@roles:worker' cmd.run 'grep VERSION= /etc/os-release' to check what workers need to be upgraded
  • Also monitor.qa.suse.de

That leaves only openqa.suse.de which is covered by #75244

#8 Updated by cdywan 12 months ago

  • Status changed from Workable to In Progress
  • DONE openqaworker2
    • installed screen
  • DONE openqaworker5
    • Got stuck installing dpdk-kmp-default, good after second zypper run
    • var-lib-openqa-share.mount loaded failed failed /var/lib/openqa/share after reboot
    • Ran sudo systemctl restart var-lib-openqa-share.mount
  • DONE openqa-monitor
    • zypper succeeded on the second attempt (refreshes are racy I guess)
  • DONE openqaworker6
  • DONE openqaworker8
    • zypper upgrade went fine.
    • var-lib-openqa-share.mount loaded failed failed /var/lib/openqa/share after reboot
    • Ran sudo systemctl restart var-lib-openqa-share.mount

DONE implies I checked that workers show up on https://openqa.suse.de/admin/workers and picked up jobs

#9 Updated by cdywan 12 months ago

  • DONE openqaworker9
  • DONE openqaworker10
    • Had to run sudo systemctl restart var-lib-openqa-share.mount
    • installed htop
    • not online, no jobs picked up yet
    • systemctl restart openqa-worker@{1..10} to remedy #78390
  • DONE openqaworker13
    • Had to run sudo systemctl restart var-lib-openqa-share.mount here as well
  • DONE QA-Power8-5-kvm.qa.suse.de
    • connection refused after reboot, stuck in petitboot
    • kexec -l /var/petitboot/mnt/dev/sda2/boot/vmlinux --initrd=/var/petitboot/mnt/dev/sda2/boot/initrd --append="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d console=tty0 console=ttyS1,115200 nospec" && kexec -e made it as far as dracut
    • Tried again, this time with the right device ID (but not me, so no log of the command)
  • DONE QA-Power8-4-kvm.qa.suse.de
    • connection refused after reboot, stuck in petitboot, kexec load failed
    • kexec -l /var/petitboot/mnt/dev/sdb2/boot/vmlinux --initrd=/var/petitboot/mnt/dev/sdb2/boot/initrd --append="root=UUID=eebe647f-e867-416e-a0fa-7a6732bfcf9d console=tty0 console=ttyS1,115200 nospec" && kexec -e resulted in a successful boot
    • Installed htop
    • kdump.service loaded failed failed Load kdump kernel and initrd after reboot
    • #56588#note-9 talks about disabling kdump - still I had to re-disable it via sudo systemctl disable --now kdump && sudo systemctl reset-failed
    • Online, not picking up jobs yet
  • DONE grenache-1
    • not online, no jobs picked up yet
    • see #78390

#10 Updated by cdywan 12 months ago

  • Related to action #78390: Worker is stuck in "broken" state due to unavailable cache service (was: and even continuously fails to (re)connect to some configured web UIs) added

#11 Updated by cdywan 12 months ago

  • Status changed from In Progress to Feedback

#12 Updated by cdywan 12 months ago

  • Status changed from Feedback to In Progress

cdywan wrote:

For reference:

  • I'm using sudo salt -C 'G@roles:worker' cmd.run 'grep VERSION= /etc/os-release' to check what workers need to be upgraded
  • Also monitor.qa.suse.de

That leaves only openqa.suse.de which is covered by #75244

@Xiaojing_liu made me aware that I missed malbec.arch.suse.de, openqaworker-arm-1.suse.de and openqaworker-arm-2.suse.de, probably due to machines being down 🙄

#13 Updated by cdywan 12 months ago

  • Status changed from In Progress to Feedback
  • WIP malbec.arch.suse.de

    • Stuck in petitboot after reboot
    • PXE autoconfiguration failed
    • netboot fails with load_kernel: /tmp/pb-2eSo7I is not a 64bit PowerPC executable
    • None of the entries mentioned in #80656#note-9 are visible.
    • Booted via a new entry with /boot/vmlinux and /boot/initrd on sdb1 with nomodeset console=hvc console=tty.

    [FAILED] Failed to mount /var/lib/openqa/share.
    [FAILED] Failed to start Load kdump kernel and initrd.
    systemctl disable --now kdump && sudo systemctl reset-failed

    • Mounting /var/lib/openqa/share looks to have succeeded afterall.
    • Worker is registered
  • WIP openqaworker-arm-2.suse.de

    • was ready to reboot
    • got unresponsive and was rebooted (by someone else?)
  • WIP openqaworker-arm-1.suse.de ready to reboot

#14 Updated by cdywan 12 months ago

  • Related to action #81046: openqaworker-arm-2.suse.de unreachable added

#15 Updated by cdywan 12 months ago

  • DONE malbec.arch.suse.de afterall
  • DONE openqaworker-arm-2.suse.de
  • DONE openqaworker-arm-1.suse.de

#16 Updated by cdywan 12 months ago

Still open:

  • openqaworker-arm-3.suse.de, see #75016
  • powerqaworker-qam-1, see #68053

#17 Updated by cdywan 12 months ago

  • Related to action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly) added

#18 Updated by cdywan 12 months ago

  • Related to action #75016: [osd-admins][alert] Failed systemd services alert (workers): os-autoinst-openvswitch.service (and var-lib-openqa-share.mount) on openqaworker-arm-2 and others added

#19 Updated by cdywan 12 months ago

  • Status changed from Feedback to Blocked

#20 Updated by cdywan 12 months ago

  • DONE openqaworker-arm-3.suse.de
    • Rebooted while there were no jobs running

#21 Updated by cdywan 12 months ago

  • Status changed from Blocked to In Progress
  • DONE powerqaworker-qam-1.qa.suse.de

#22 Updated by cdywan 12 months ago

  • Status changed from In Progress to Feedback

#23 Updated by okurz 12 months ago

  • Status changed from Feedback to Resolved

ssh osd "sudo salt '*' cmd.run 'grep VERSION /etc/os-release'" returns 15.2 for all machines that are currently in salt :) staging machines are left as an exercise to the next users :D Do you agree to set this to Resolved?

#24 Updated by cdywan 12 months ago

okurz wrote:

ssh osd "sudo salt '*' cmd.run 'grep VERSION /etc/os-release'" returns 15.2 for all machines that are currently in salt :) staging machines are left as an exercise to the next users :D Do you agree to set this to Resolved?

Ack. I wouldn't consider staging as part of osd and this ticket for that matter. Although I might just sort those out when nobody's looking, I practically remember the steps by heart now 😂

#25 Updated by okurz 3 months ago

  • Copied to action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M added

Also available in: Atom PDF