action #170077
open
Put more storage into qamaster "to make our lives easier in general"
Added by okurz 4 days ago.
Updated about 6 hours ago.
Category:
Feature requests
Due date:
2024-12-07 (Due in 14 days)
Description
Motivation¶
Based on suggestion by Nick Singer. We can check the physical slots of the machine and see if we have spare devices that would help us. okurz thinks we have some.
details in racktable hint to a chassis with 8x3.5" slots and the OS currently uses 3
I put one 2TB+1TB into qamaster. Might have broken some RAID . If you see problems feel free to trigger a reboot or power cycle. Won't have time today anymore myself
qamaster has 12 (!) physical storage devices. In the OS we have a 600GB "sda" and 4TB "sdb" but there does not seem to be a physical 4TB device so I assume we have a hardware RAID0 or RAID5 or similar. Physically attached display+keyboard. Booting without devices to understand which are connected to an internal storage controller or how this works. "Entering setup…" stays there for long, since 12:39Z until 12:41Z so don't be surprised that it takes 3m to reach the BIOS. BIOS SATA Configuration says it has 6 ports, port 0 through 5 with AHCI mode and hot plug enabled for all. Then there is a page "SCU Configuration" where "Storage Controller Unit" was disabled. Now enabled. Port 0 through 7, all "not present". Also enabled "EMS Console Redirection", "Out-of-Band Mgmt Port COM2/SOL". Maybe can see more over IPMI. On IPMI SOL I could see the BIOS screen but maybe we had that already in before. I plugged in storage slot 0, see blue led, but both SATA and SCU port 0 show "not present". Also plugged in port 11, see red led. "not present". Restarting system and entering setup again. Still no slots show up. Exited setup and even after 5m of waiting the screen outside setup is just black and machine does not respond. I entered setup again and disabled the SKU controller again and rebooted (13:14Z), 1320Z not up. Disabled "EMS Console Redirection" now, 1323Z. Also machine beeps, like in #114893. The two new devices are currently not connected. Now it's priority to bring back the machine as-is. The two trays with the new disks I for now put in the storage cabinet. No luck to bring the machine back up so far.
- Related to action #170026: [QA][tools][monitor] monitor.qa.suse.de is down added
- Priority changed from Normal to Urgent
- Status changed from New to In Progress
- Priority changed from Urgent to High
On boot I could press ctrl-h and reach the RAID controller firmware menu. From there found "foreign config" for DG0 and DG1 but "unconfigured" for slot 10+11 which are also the one showing up with a red light. At least boot to root should work with this. Exited, reboot, entered again, verified valid config to this point. But after another reboot I can still not boot from local disk. nicksinger has enabled network boot and PXE+EFI. System ends up in EFI shell. Need to exit with command "exit". The network boot with DHCP was showing many times. Eventually I booted a Tumbleweed system with ttyS1 which showed output on IPMI SoL.
The good thing is: I'm in a live Tumbleweed system and can confirm that both the root partition as well as the VM data partition is fully usable. Interesting is that we have two 2TB disks which were apparently configured for RAID0 but not used in the past years(?). And the system does not boot up yet but I am relieved so far.
Reconfigured boot settings with nicksinger. System came up again. VMs are running. So at least recovered up to that point. Things to do:
- iPXE should also display on local console, e.g. add
console=tty1 console=ttyS1
(or the other way around) to display also something on local screen, not just remotely
- Create backup of backup and VMs, config, jenkins, etc.
- Migrate VMs to modern hypervisor solution, e.g. openplatform
- Physically label slot 10+11
- Bring slot 10+11 into use, maybe at best software RAID0 or RAID1, not hardware RAID
- check if settings "console redirection EMS" helps us, e.g. to mirror more output to physical monitor and SoL
- Document that KVMViewer can output VGA whereas IPMI SoL only serial (is that right?)
- Due date set to 2024-12-07
Setting due date based on mean cycle time of SUSE QE Tools
Also available in: Atom
PDF