action #160514
closedqamaster is down, i.e. also no monitoring from monitor.qe.nue2.suse.org
0%
Description
Observation¶
Trying to access https://monitor.qa.suse.de/d/ showed that monitor.qe.nue2.suse.org is down and also qamaster is not accessible. IPMI still up though.
Updated by okurz 7 months ago
Added IPMI credentials with https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/813
ipmitool -Ilanplus -H qamaster-sp.qe.nue2.suse.org … power reset
and sol activate
shows initial firmware initialization with "B2" in lower right corner and then a black screen after that, no change for 5 minutes. Trying ipmitool -Ilanplus -H qamaster-sp.qe.nue2.suse.org … power off && sleep 180 && ipmitool -Ilanplus -H qamaster-sp.qe.nue2.suse.org … power on
. I then used "IPMIView" with the internal "Java iKVM Viewer". That showed me that grub tries and fails to load with error: ../../grub-core/kern/dl.c:380:symbol 'grub_verify_string' not found
. Using IPMIView I selected to boot from PXE and reboot. Then selected Tumbleweed ttyS1, trying rescue. Didn't see anything on serial terminal but could login with ssh_nt root@qamaster.qe.nue2.suse.org
after ping responded.
Wrong approach loading to read-only mount…
I looked up storage volumes and found /dev/sdc2, mounted and chroot'd:
mkdir -p /mnt/sdc2
/dev/sdc2 /mnt/sdc2
for i in proc sys dev dev/pts run ; do mount -o bind /$i /mnt/sdc2/$i; done
chroot /mnt/sdc2
mount -a
In there I first reproduced the problem without needing to reboot the physical machine
qemu-system-x86_64 -snapshot -nographic /dev/sdc
this shows the original problem quickly. Did
grub2-install /dev/sdc
and then the qemu command verified that the problem was fixed. Triggered reboot with echo b >/proc/sysrq-trigger
. After reboot the system booted up fine again and also VMs are up again.
Updated by okurz 7 months ago
- Status changed from In Progress to Resolved
Also monitoring data is back, e.g. on https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&refresh=2h