Project

General

Profile

Actions

action #114526

closed

openQA Project - coordination #111860: [epic] Upgrade all our infrastructure, e.g. o3+osd workers+webui and production workloads, to openSUSE Leap 15.4

recover openqaworker14

Added by okurz almost 2 years ago. Updated almost 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Observation

After upgrade to Leap 15.4 seems like openqaworker14 wasn't properly rebooting

Rollback steps

  • Add back to salt

Related issues 2 (0 open2 closed)

Copied from openQA Infrastructure - action #111866: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4Resolvedokurz

Actions
Copied to openQA Infrastructure - action #114565: recover qa-power8-4+qa-power8-5 size:MResolvedokurz2022-12-19

Actions
Actions #1

Updated by okurz almost 2 years ago

  • Copied from action #111866: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.4 added
Actions #2

Updated by okurz almost 2 years ago

Main problem was that we could not see any output due to serial port misconfiguration. This was started initially in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/654 but not completed. Now handled in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/715 by mkittler and me. Fixed manually on openqaworker14 and we found out that openqaworker14 suffered the same as openqaworker8+9 from changes in behaviour in lsblk

Actions #3

Updated by mkittler almost 2 years ago

  • Status changed from In Progress to Feedback

The SR has been merged. The worker also already runs jobs and looks good.

Actions #4

Updated by okurz almost 2 years ago

  • Copied to action #114565: recover qa-power8-4+qa-power8-5 size:M added
Actions #5

Updated by mkittler almost 2 years ago

  • Status changed from Feedback to Resolved

The worker is still running and https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/715 has been merged and applied.

Actions #6

Updated by okurz almost 2 years ago

  • Description updated (diff)
  • Status changed from Resolved to Feedback

Did you try multiple reboots? the machine seems to be down again. Please take a look, fix and re-add to salt.

Actions #7

Updated by mkittler almost 2 years ago

  • Status changed from Feedback to In Progress

No, since I thought we would have identified the issue and one test would be enough. At least enabling the serial console worked.

Actions #8

Updated by mkittler almost 2 years ago

The fix https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/f2e9ab9cbfac53021a9aecc3b14b8f5bb35611eb it still deployed. I also think that now we've ran into another issue as it fails even earlier than before:

Jul 31 03:36:26 openqaworker14 systemd[1]: Starting Setup NVMe before mounting it...g it.
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1580]: Current mount points (printed for debugging purposes):
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: devtmpfs on /dev type devtmpfs (rw,nosuid,size=4096k,nr_inodes=1048576,mode=755,inode64)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: tmpfs on /run type tmpfs (rw,nosuid,nodev,size=211027992k,nr_inodes=819200,mode=755,inode64)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/misc type cgroup (rw,nosuid,nodev,noexec,relatime,misc)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on / type btrfs (rw,relatime,ssd,space_cache,subvolid=267,subvol=/@/.snapshots/1/snapshot)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=228)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /.snapshots type btrfs (rw,relatime,ssd,space_cache,subvolid=266,subvol=/@/.snapshots)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /boot/grub2/i386-pc type btrfs (rw,relatime,ssd,space_cache,subvolid=265,subvol=/@/boot/grub2/i386-pc)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /boot/grub2/x86_64-efi type btrfs (rw,relatime,ssd,space_cache,subvolid=264,subvol=/@/boot/grub2/x86_64-efi)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /opt type btrfs (rw,relatime,ssd,space_cache,subvolid=262,subvol=/@/opt)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /home type btrfs (rw,relatime,ssd,space_cache,subvolid=263,subvol=/@/home)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /root type btrfs (rw,relatime,ssd,space_cache,subvolid=261,subvol=/@/root)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /srv type btrfs (rw,relatime,ssd,space_cache,subvolid=260,subvol=/@/srv)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /tmp type btrfs (rw,relatime,ssd,space_cache,subvolid=259,subvol=/@/tmp)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /usr/local type btrfs (rw,relatime,ssd,space_cache,subvolid=258,subvol=/@/usr/local)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1585]: /dev/nvme0n1p2 on /var type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/@/var)
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1580]: Present block devices (printed for debugging purposes):
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: nvme0n1     259:0    0  3.5T  0 disk
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: ├─nvme0n1p1 259:1    0  512M  0 part
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: ├─nvme0n1p2 259:2    0    1T  0 part /var
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /usr/local
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /tmp
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /srv
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /root
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /home
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /opt
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /boot/grub2/x86_64-efi
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /boot/grub2/i386-pc
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /.snapshots
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: │                                    /
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1586]: └─nvme0n1p3 259:3    0  2.5T  0 part
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1580]: Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1p3
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1593]: mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1593]:        level=raid0 devices=1 ctime=Fri Jul 22 14:23:37 2022
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1593]: mdadm: Array name /dev/md/openqa is in use already.
Jul 31 03:36:26 openqaworker14 systemd[1]: openqa_nvme_format.service: Main process exited, code=exited, status=1/FAILURE
Jul 31 03:36:26 openqaworker14 openqa-establish-nvme-setup[1580]: Unable to create RAID, mdadm returned with non-zero code
Jul 31 03:36:26 openqaworker14 systemd[1]: openqa_nvme_format.service: Failed with result 'exit-code'.
Jul 31 03:36:26 openqaworker14 systemd[1]: Failed to start Setup NVMe before mounting it.

(Before it failed on Status for RAID0 "/dev/md/openqa", now on the mdadm --create … command.)


Maybe there's a race condition with stopping the device. There's no Stopping current RAID "/dev/md/openqa" logged although [[ -e /dev/md/openqa ]] is true. I suppose it wasn't true when the check was conducted by the script but possibly turned true before the RAID creation command was executed.

Actions #9

Updated by mkittler almost 2 years ago

The following SR should help: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/722 (see the SR description for details)

When rebooting (without changes from the SR) I ran into the exact same issue again and pressing CTRL-D to simply try again helped. So I'm confident that the SR will help. I'll apply it now manually from OSD (adding the machine back to salt).

Actions #10

Updated by mkittler almost 2 years ago

Works with the fix in place. I could reproduce a run affected by the problem and the retry-code helped:

Aug 01 11:49:49 openqaworker14 systemd[1]: Starting Setup NVMe before mounting it...
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1563]: Current mount points (printed for debugging purposes):
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: devtmpfs on /dev type devtmpfs (rw,nosuid,size=4096k,nr_inodes=1048576,mode=755,inode64)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: tmpfs on /run type tmpfs (rw,nosuid,nodev,size=211027992k,nr_inodes=819200,mode=755,inode64)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755,inode64)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/misc type cgroup (rw,nosuid,nodev,noexec,relatime,misc)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on / type btrfs (rw,relatime,ssd,space_cache,subvolid=267,subvol=/@/.snapshots/1/snapshot)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=30,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=424)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /.snapshots type btrfs (rw,relatime,ssd,space_cache,subvolid=266,subvol=/@/.snapshots)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /boot/grub2/i386-pc type btrfs (rw,relatime,ssd,space_cache,subvolid=265,subvol=/@/boot/grub2/i386-pc)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /boot/grub2/x86_64-efi type btrfs (rw,relatime,ssd,space_cache,subvolid=264,subvol=/@/boot/grub2/x86_64-efi)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /home type btrfs (rw,relatime,ssd,space_cache,subvolid=263,subvol=/@/home)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /opt type btrfs (rw,relatime,ssd,space_cache,subvolid=262,subvol=/@/opt)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /srv type btrfs (rw,relatime,ssd,space_cache,subvolid=260,subvol=/@/srv)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /root type btrfs (rw,relatime,ssd,space_cache,subvolid=261,subvol=/@/root)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /usr/local type btrfs (rw,relatime,ssd,space_cache,subvolid=258,subvol=/@/usr/local)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /tmp type btrfs (rw,relatime,ssd,space_cache,subvolid=259,subvol=/@/tmp)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1566]: /dev/nvme0n1p2 on /var type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/@/var)
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1563]: Present block devices (printed for debugging purposes):
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: nvme0n1     259:0    0  3.5T  0 disk
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: ├─nvme0n1p1 259:1    0  512M  0 part
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: ├─nvme0n1p2 259:2    0    1T  0 part /var
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /tmp
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /usr/local
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /root
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /srv
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /opt
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /home
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /boot/grub2/x86_64-efi
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /boot/grub2/i386-pc
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /.snapshots
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: │                                    /
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1567]: └─nvme0n1p3 259:3    0  2.5T  0 part
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1563]: Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1p3
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1574]:        level=raid0 devices=1 ctime=Mon Aug  1 11:34:36 2022
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: Array name /dev/md/openqa is in use already.
Aug 01 11:49:49 openqaworker14 openqa-establish-nvme-setup[1563]: Waiting 10 seconds before trying again after failing due to in-use device (maybe it came up just after checking to stop it before).
Aug 01 11:49:59 openqaworker14 openqa-establish-nvme-setup[1563]: Trying RAID0 creation again after timeout (attempt 2 of 10)
Aug 01 11:49:59 openqaworker14 openqa-establish-nvme-setup[1563]: Stopping current RAID "/dev/md/openqa"
Aug 01 11:49:59 openqaworker14 openqa-establish-nvme-setup[1805]: mdadm: stopped /dev/md/openqa
Aug 01 11:49:59 openqaworker14 openqa-establish-nvme-setup[1563]: Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1p3
Aug 01 11:49:59 openqaworker14 openqa-establish-nvme-setup[1816]: mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
Aug 01 11:49:59 openqaworker14 openqa-establish-nvme-setup[1816]:        level=raid0 devices=1 ctime=Mon Aug  1 11:34:36 2022
Aug 01 11:50:00 openqaworker14 openqa-establish-nvme-setup[1816]: mdadm: Defaulting to version 1.2 metadata
Aug 01 11:50:00 openqaworker14 openqa-establish-nvme-setup[1816]: mdadm: array /dev/md/openqa started.
Aug 01 11:50:00 openqaworker14 openqa-establish-nvme-setup[1563]: Status for RAID0 "/dev/md/openqa"
Aug 01 11:50:00 openqaworker14 openqa-establish-nvme-setup[1835]: md127 : active raid0 nvme0n1p3[0]
Aug 01 11:50:00 openqaworker14 openqa-establish-nvme-setup[1837]: ARRAY /dev/md/openqa metadata=1.2 name=openqaworker14:openqa UUID=f1055e20:fb20276a:789d4502:70a84400
Aug 01 11:50:00 openqaworker14 openqa-establish-nvme-setup[1563]: Creating ext2 filesystem on RAID0 "/dev/md/openqa"
Aug 01 11:50:00 openqaworker14 openqa-establish-nvme-setup[1841]: mke2fs 1.46.4 (18-Aug-2021)
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]: [220B blob data]
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]: Creating filesystem with 669084672 4k blocks and 167272448 inodes
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]: Filesystem UUID: ea63d35e-fad0-4de8-be7d-df36d0dcfb1b
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]: Superblock backups stored on blocks:
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]:         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]:         4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]:         102400000, 214990848, 512000000, 550731776, 644972544
Aug 01 11:50:03 openqaworker14 openqa-establish-nvme-setup[1841]: [73B blob data]
Aug 01 11:50:17 openqaworker14 openqa-establish-nvme-setup[1841]: [378B blob data]
Aug 01 11:50:17 openqaworker14 openqa-establish-nvme-setup[1841]: [107B blob data]
Aug 01 11:50:17 openqaworker14 systemd[1]: openqa_nvme_format.service: Deactivated successfully.
Aug 01 11:50:17 openqaworker14 systemd[1]: Finished Setup NVMe before mounting it.

Rebooting a few more times to be sure.

Actions #11

Updated by mkittler almost 2 years ago

  • Status changed from In Progress to Resolved

Looks like it works after three reboots. I could now always reproduce this case so this race condition seems to be quite stable. Considering the retry works and the SR has also already been merged I'm resolving the issue.

I also resumed the alert.

Actions #12

Updated by okurz almost 2 years ago

Just to make sure, because you didn't mention. Did you add the host back to salt and a clean up-to-date state applied?

Actions

Also available in: Atom PDF