action #81058: [tracker-ticket] Power machines can't find installed OS. Automatic reboots disabled for now - openQA Infrastructure (public) - openSUSE Project Management Tool

kexec -l /var/petitboot/mnt/dev/sdb2/boot/vmlinux-5.3.18-lp152.57-default --
initrd=/var/petitboot/mnt/dev/sdb2/boot/initrd-5.3.18-lp152.57-default --command-line="root=UUID=e29496d5-0080-4a01-9bde-b786944f4ba4 nospec kvm.nested=1 kvm_intel.nested=1 kvm_amd.nested=1 kvm-arm.nested=1 crashkernel=210M" && kexec -e

Actions

Copy link

#9

Updated by livdywan almost 4 years ago

Description updated (diff)

Actions

Copy link

#10

Updated by livdywan almost 4 years ago

Description updated (diff)

Actions

Copy link

#11

Updated by livdywan almost 4 years ago

Related to action #88474: All workers on powerqaworker-qam-1 are offline added

Actions

Copy link

#12

Updated by okurz almost 4 years ago

Related to action #68053: powerqaworker-qam-1 fails to come up on reboot (repeatedly) added

Actions

Copy link

#13

Updated by okurz almost 4 years ago

Description updated (diff)

Actions

Copy link

#14

Updated by okurz almost 4 years ago

Status changed from Feedback to In Progress
Assignee changed from nicksinger to okurz

After progress in #68053 I am running a check on all PowerPC osd machines

On OSD:

for run in {01..10}; do for host in QA-Power8-4-kvm.qa QA-Power8-5-kvm.qa powerqaworker-qam-1 malbec.arch grenache-1.qa; do echo -n "run: $run, $host: ping .. " && timeout -k 5 600 sh -c "until ping -c30 $host >/dev/null; do :; done" && echo -n "ok, ssh .. " && timeout -k 5 600 sh -c "until nc -z -w 1 $host 22; do :; done" && echo -n "ok, salt .. " && timeout -k 5 600 sh -c " until salt --timeout=300 --no-color $host\* test.ping >/dev/null; do :; done" && echo -n "ok, uptime/reboot: " && salt $host\* cmd.run "uptime && systemctl disable --now openqa-worker-cacheservice.service >/dev/null" && salt $host\* system.reboot 1 || break; done || break; done

Actions

Copy link

#15

Updated by openqa_review almost 4 years ago

Due date set to 2021-04-16

Setting due date based on mean cycle time of SUSE QE Tools

Actions

Copy link

#16

Updated by okurz almost 4 years ago

Status changed from In Progress to Resolved

The above experiment was succesful. Machines came up just fine after reboot.
I unpaused the alerts "Broken workers alert", "Failed systemd services alert (except openqa.suse.de)", "Failed systemd services alert (except openqa.suse.de)" and confirmed that all expected services on these machines are active and the machines are working on openQA jobs.

Unmasked rebootmgr.service again and created
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/472
to enable the rebootmgr service again everywhere.

I also enabled rebootmgr on storage.qa where auto-update was already active but not rebootmgr. Tested that reboot works fine as well.

Actions

Copy link

#17

Updated by okurz 10 months ago

Related to action #139115: Ensure o3 openQA PowerPC machine qa-power8-3 is operational from PRG2 size:M added

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

Tags

Custom queries

action #81058

[tracker-ticket] Power machines can't find installed OS. Automatic reboots disabled for now

Updated by nicksinger about 4 years ago

Updated by nicksinger about 4 years ago

Updated by nicksinger about 4 years ago

Updated by okurz about 4 years ago

Updated by okurz about 4 years ago

Updated by nicksinger about 4 years ago

Updated by nicksinger about 4 years ago

Updated by Xiaojing_liu about 4 years ago

Updated by livdywan almost 4 years ago

Updated by livdywan almost 4 years ago

Updated by livdywan almost 4 years ago

Updated by okurz almost 4 years ago

Updated by okurz almost 4 years ago

Updated by okurz almost 4 years ago

Updated by openqa_review almost 4 years ago

Updated by okurz almost 4 years ago

Updated by okurz 10 months ago