Project

General

Profile

Actions

action #169576

closed

coordination #157969: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.6

Recover qa-power8-3 power machine size:S

Added by gpathak about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Organisational
Target version:
Start date:
2024-11-08
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

Acceptance criteria

  • AC1: qa-power8-3 worker machines should be up and running with Leap 15.5 or Leap 15.6 whichever version works

Suggestions

  • read https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades
  • Reserve some time when the workers are only executing a few or no openQA test jobs
  • Keep IPMI interface ready and test that Serial-over-LAN works for potential recovery
  • Apply the workaround for #162296, i.e. zypper al -m "boo#1227616" *firewall*
  • Use the instructions from above
  • After upgrade reboot and check everything working as expected, if not rollback, e.g. with transactional-update rollback

Files

pb-discover.log (11.4 KB) pb-discover.log gpathak, 2024-10-25 09:21
clipboard-202411051428-dg60d.png (52.6 KB) clipboard-202411051428-dg60d.png gpathak, 2024-11-05 08:58

Related issues 1 (0 open1 closed)

Related to openQA Project (public) - action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:SResolvedgpathak

Actions
Actions #1

Updated by gpathak about 1 month ago

  • Copied from action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added
Actions #2

Updated by nicksinger about 1 month ago

  • Status changed from New to Workable

We broke this out into another ticket to have at least one working machine back for o3. I recovered it on Friday be explicitly selecting leap15.5 again in petitboot. Since then it conducted at least one reboot and came up automatically again. I have to conduct a proper reboot check but currently everything looks fine.

Actions #3

Updated by nicksinger about 1 month ago

  • Status changed from Workable to In Progress

started ./reboot-stability-check qa-power8-3.openqanet.opensuse.org on ariel (I cannot ping the machine from my own machine due to the separated network).

Actions #4

Updated by nicksinger about 1 month ago

  • Status changed from In Progress to Resolved

the script managed 27 reboots and the machine came up all the time automatically, I would consider this stable.

Actions #5

Updated by okurz about 1 month ago

  • Status changed from Resolved to Feedback

Great news! But if I understood correctly that machine is still on Leap 15.5 . What do we need to keep in mind for a Leap 15.6 upgrade?

Actions #6

Updated by nicksinger about 1 month ago

  • Status changed from Feedback to Blocked

okurz wrote in #note-5:

Great news! But if I understood correctly that machine is still on Leap 15.5 . What do we need to keep in mind for a Leap 15.6 upgrade?

oh, you are right I was just too focused on getting a single machine back into a stable state. Currently I would wait until a second power machine proves stables; https://progress.opensuse.org/issues/157972#note-48

Actions #7

Updated by nicksinger about 1 month ago

  • Copied from deleted (action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S)
Actions #8

Updated by nicksinger about 1 month ago

  • Blocked by action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added
Actions #9

Updated by nicksinger about 1 month ago

  • Blocked by deleted (action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S)
Actions #10

Updated by nicksinger about 1 month ago

  • Related to action #157972: Upgrade o3 workers to openSUSE Leap 15.6 size:S added
Actions #11

Updated by nicksinger about 1 month ago

  • Status changed from Blocked to Resolved

Closing as the task for upgrading is better described in https://progress.opensuse.org/issues/169939

Actions

Also available in: Atom PDF