https://progress.opensuse.org/
https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?1582917784
2022-09-12T08:21:38Z
openSUSE Project Management Tool
openQA Infrastructure - action #116437: Recover qa-power8-5 size:M
https://progress.opensuse.org/issues/116437?journal_id=552361
2022-09-12T08:21:38Z
livdywan
liv.dywan@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-4 priority-default closed child parent" href="/issues/114565">action #114565</a>: recover qa-power8-4+qa-power8-5 size:M</i> added</li></ul>
openQA Infrastructure - action #116437: Recover qa-power8-5 size:M
https://progress.opensuse.org/issues/116437?journal_id=552439
2022-09-12T10:48:33Z
mkittler
marius.kittler@suse.com
<ul><li><strong>Assignee</strong> set to <i>mkittler</i></li></ul>
openQA Infrastructure - action #116437: Recover qa-power8-5 size:M
https://progress.opensuse.org/issues/116437?journal_id=552442
2022-09-12T10:50:43Z
mkittler
marius.kittler@suse.com
<ul><li><strong>Subject</strong> changed from <i>recover qa-power8-4+qa-power8-5 size:M</i> to <i>Recover qa-power8-5 size:M</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul><p>I'm recovering qa-power8-5. Note that qa-power8-4 is not broken and also not mentioned in the ticket description. Hence I'm removing it from the ticket title.</p>
openQA Infrastructure - action #116437: Recover qa-power8-5 size:M
https://progress.opensuse.org/issues/116437?journal_id=552448
2022-09-12T11:09:57Z
mkittler
marius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Resolved</i></li></ul><p>This time it isn't a crash (like the last time I had to recover the worker, see <a class="issue tracker-4 status-3 priority-4 priority-default closed child parent" title="action: recover qa-power8-4+qa-power8-5 size:M (Resolved)" href="https://progress.opensuse.org/issues/114565#note-40">#114565#note-40</a>). The worker was just shutting down normally but then didn't came up again. Considering that there were no log messages in the journal after the shutdown it must have been stuck somewhere early in the boot. I don't know where exactly because there was nothing over SOL. A power reset helped and the machine came back without problems.</p>
<p>That's the journal shortly before the shutdown:</p>
<pre><code>Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopped Open vSwitch.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopping Open vSwitch Forwarding Unit...
Sep 11 03:31:36 QA-Power8-5-kvm ovs-ctl[81634]: Exiting ovs-vswitchd (100958)..done
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: ovs-vswitchd.service: Deactivated successfully.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopped Open vSwitch Forwarding Unit.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: ovs-delete-transient-ports.service: Deactivated successfully.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopped Open vSwitch Delete Transient Ports.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopping Open vSwitch Database Unit...
Sep 11 03:31:36 QA-Power8-5-kvm ovs-ctl[81658]: Exiting ovsdb-server (100900)..done
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: ovsdb-server.service: Deactivated successfully.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopped Open vSwitch Database Unit.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopped target Preparation for Network.
Sep 11 03:31:36 QA-Power8-5-kvm systemd[1]: Stopping firewalld - dynamic firewall daemon...
-- Boot 4b14fb20a8df443d815bb85c60796d74 --
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Reserving 210MB of memory at 128MB for crashkernel (System RAM: 262144MB)
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: Page sizes from device-tree:
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=16: shift=24, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=8
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=20: shift=20, sllp=0x0130, avpnm=0x00000000, tlbiel=0, penc=2
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=24: shift=24, sllp=0x0100, avpnm=0x00000001, tlbiel=0, penc=0
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: base_shift=34: shift=34, sllp=0x0120, avpnm=0x000007ff, tlbiel=0, penc=3
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Enabling pkeys with max key count 32
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Disabling hardware transactional memory (HTM)
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Activating Kernel Userspace Access Prevention
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Activating Kernel Userspace Execution Prevention
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Page orders: linear mapping = 24, virtual = 16, io = 16, vmemmap = 24
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Using 1TB segments
Sep 12 12:50:33 QA-Power8-5-kvm kernel: hash-mmu: Initializing hash mmu with SLB
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Linux version 5.14.21-150400.24.18-default (geeko@buildhost) (gcc (SUSE Linux) 7.5.0, GNU ld (GNU Binutils; SUSE Linux Enterprise 15) 2.37.20211103-150100.7.37) #1 SMP Thu Aug 4 14:17:48 UTC 2022 (e9f7bfc)
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Secure boot mode disabled
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Found initrd at 0xc000000004470000:0xc000000005ea059d
Sep 12 12:50:33 QA-Power8-5-kvm kernel: OPAL: Found non-mapped LPC bus on chip 0
Sep 12 12:50:33 QA-Power8-5-kvm kernel: Using PowerNV machine description
Sep 12 12:50:33 QA-Power8-5-kvm kernel: printk: bootconsole [udbg0] enabled
</code></pre>
<p>I've retriggered the OSD deployment pipeline which was failing due to that. It has just succeeded so I suppose we can consider this issue resolved. (Likely at some point we should have automatic recovery for all our workers, not just for the ARM machines.)</p>
openQA Infrastructure - action #116437: Recover qa-power8-5 size:M
https://progress.opensuse.org/issues/116437?journal_id=552589
2022-09-12T16:03:50Z
mkittler
marius.kittler@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-1 priority-4 priority-default" href="/issues/116473">action #116473</a>: Add OSD PowerPC workers to automatic recovery we already have for ARM workers</i> added</li></ul>
openQA Infrastructure - action #116437: Recover qa-power8-5 size:M
https://progress.opensuse.org/issues/116437?journal_id=557602
2022-09-27T09:24:35Z
okurz
okurz@suse.com
<ul><li><strong>Related to</strong> <i><a class="issue tracker-4 status-3 priority-3 priority-lowest closed child" href="/issues/80482">action #80482</a>: qa-power8-5-kvm has been down for days, use more robust filesystem setup</i> added</li></ul>