Project

General

Profile

Actions

action #132353

closed

coordination #121720: [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuring availability

coordination #131525: [epic] Up-to-date and usable LSG QE NUE1 machines

Bring enterprise-nx02.qam.suse.de up-to-date size:M

Added by livdywan over 1 year ago. Updated about 1 year ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2023-06-28
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Motivation

Acceptance criteria

  • AC1: The machine entry in netbox is up-to-date and has either a "Move to …" or "To be decommissioned" tag
  • AC2: We know if/how the machine is usable and how it is used

Suggestions

  • There's no entry in racktables (anymore). Let's not add a new one, right? (We would have only updated an existing entry.)
  • The Netbox page is https://netbox.suse.de/dcim/devices/6319. (Contains also comments at the bottom.)
  • Ask on #eng-testing who owns/uses the machine.
  • Ask @Antonios Pappas who is currently the primary contact person for that machine.
  • If there is really nobody that owns/uses the machine, state that explicitly in Netbox.

Related issues 2 (0 open2 closed)

Related to QA - action #107731: Salt all SUSE QA machines, at least passwords and ssh keys and automatic upgrading size:MResolvedokurz2022-03-01

Actions
Copied from QA - action #132323: Bring arm4.qe.suse.de up-to-dateResolvedokurz2023-06-28

Actions
Actions #1

Updated by livdywan over 1 year ago

Actions #2

Updated by livdywan over 1 year ago

  • Related to action #107731: Salt all SUSE QA machines, at least passwords and ssh keys and automatic upgrading size:M added
Actions #3

Updated by livdywan over 1 year ago

  • Tags set to infra
Actions #4

Updated by okurz over 1 year ago

  • Subject changed from Bring enterprise-nx02.qam.suse.de up-to-date size:M to Bring enterprise-nx02.qam.suse.de up-to-date
Actions #5

Updated by okurz over 1 year ago

  • Target version changed from Ready to future
Actions #7

Updated by okurz about 1 year ago

  • Target version changed from future to Ready

The system has degraded hardware raid with alarm tone. onboard raid controller showing multiple red leds, likely for that reason. Still system boots up and should be reachable over ssh. I set "ipmitool lan set ipsrc dhcp" and set IPMI user/password to ADMIN/ADMIN, system root password hacky one. I put in a replacement HDD and the hardware RAID controller is currently rebuilding the RAID. Status should be checked at a later time over IPMI SoL. If nothing useful shows up just reboot and check the output on SoL.

Actions #8

Updated by livdywan about 1 year ago

  • Subject changed from Bring enterprise-nx02.qam.suse.de up-to-date to Bring enterprise-nx02.qam.suse.de up-to-date size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #9

Updated by okurz about 1 year ago

  • Assignee set to okurz
Actions #10

Updated by okurz about 1 year ago

  • Assignee deleted (okurz)

I connected physical screen and keyboard although IPMI SoL would likely have sufficed. The physical connection for screen and keyboard is just a bit more reliable. drives 0+1, 2+3, 6+7 are fine and now in sync. 5 was still showing as missing. I rescanned all devices and that found 5 again but degraded. Then I did "Initialize Drives" and re-initialized drive. Now it's rebuilding again. This will again take some hours. To be checked later.

EDIT: Ended up eventually as "degraded" again. nicksinger and me then removed the complete RAID and installed a plain Leap 15.5 on a RAID1 formed by sda+sdb. The system failed to boot up properly. I booted again an interactive installer, logged in over ssh and chrooted into the root partition and did grub2-install /dev/md0p4 but that showed

grub2-install: warning: btrfs zstd compression is disabled, please change install device to disk.
grub2-install: error: ../grub-core/disk/diskfilter.c:1038:diskfilter writes are not supported.

I don't think that worked as intended. I guess we should try the installation again. But everything relevant is remotely accessible.

Actions #11

Updated by okurz about 1 year ago

  • Target version changed from Ready to future

I switched the machine off over IPMI and confirmed that the policy is "always-off" so that no power is wasted. We need to focus on more closely migration related tasks now.

Actions #12

Updated by okurz about 1 year ago

  • Status changed from Workable to Resolved
  • Assignee set to okurz
  • Target version changed from future to Ready
Actions

Also available in: Atom PDF