Project

General

Profile

Actions

action #114565

closed

openQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

recover qa-power8-4+qa-power8-5 size:M

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
2022-12-19
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Tags:

Description

Observation

After upgrade to Leap 15.4 seems like qa-power8-4 wasn't properly rebooting. okurz could connect over ssh but asked for a password where normally we should have SSH keys. mkittler had varying success with "power reset" and "power cycle". Over SoL mkittler saw petitboot

Acceptance criteria

  • AC1: Both qa-power8-4 and qa-power8-5 are used for production openQA jobs again
  • AC2: Stable over reboot
  • AC3: Alerts unpaused

Further information

Suggestions

Rollback steps

  • After https://bugzilla.opensuse.org/show_bug.cgi?id=1202138 is resolved remove kernel-default and util-linux zypper package locks on qa-power8-4, qa-power8-5, power8.openqanet.opensuse.org
  • Upgrade kernel+OS on qa-power8-4, qa-power8-5, power8.openqanet.opensuse.org

Files

power8-5.log.txt (2.07 MB) power8-5.log.txt mkittler, 2022-07-22 14:09
ipmi-qa-power8-5-boot-loop.txt (1.08 MB) ipmi-qa-power8-5-boot-loop.txt mkittler, 2022-10-24 14:07

Subtasks 1 (0 open1 closed)

action #122158: [alert] qa-power8-4-kvm host up alert - machine not up, nothing obvious on SoL but IPMI works size:MResolveddheidler2022-12-19

Actions

Related issues 10 (1 open9 closed)

Related to openQA Infrastructure - action #115208: failed-systemd-services: logrotate-openqa alerting on and off size:MResolvedlivdywan

Actions
Related to openQA Infrastructure - action #116437: Recover qa-power8-5 size:MResolvedmkittler

Actions
Related to openQA Infrastructure - action #116473: Add OSD PowerPC workers to automatic recovery we already have for ARM workersNew2022-09-12

Actions
Related to openQA Infrastructure - action #116743: [alert] QA-Power8-5-kvm: host up alertResolvednicksinger2022-09-192022-10-04

Actions
Related to openQA Infrastructure - action #117229: [tools] openqa failing on worker QA-Power8-5-kvmResolvedmkittler2022-09-262022-10-13

Actions
Related to openQA Infrastructure - coordination #117268: [epic] Handle reduced PowerPC ressourcesResolvedokurz2022-07-21

Actions
Related to openQA Infrastructure - action #118024: Ensure all PPC workers are upgraded after kernel regression resolved size:MResolvedmkittler2022-10-11

Actions
Related to openQA Infrastructure - action #119290: [alert] Packet loss between worker hosts and other hosts alertRejectedokurz2022-10-24

Actions
Related to openQA Infrastructure - action #116078: Recover o3 worker kerosene formerly known as power8, restore IPMI access size:MResolvedokurz2022-08-31

Actions
Copied from openQA Infrastructure - action #114526: recover openqaworker14Resolvedmkittler

Actions
Actions

Also available in: Atom PDF