Actions
action #104965
closedopenqaworker10 restarted into maintenance mode - reason unknown
Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2022-01-17
Due date:
% Done:
0%
Estimated time:
Description
The most recent deployment pipeline (https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/300009) failed because openqaworker10 is down. Logging in over IPMI shows:
Give root password for maintenance
(or press Control-D to continue):
so we need to figure out what happened to worker10
Updated by nicksinger almost 3 years ago
openqa_nvme_format.service failed to start:
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1040]: Current mount points (printed for debugging purposes):
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: devtmpfs on /dev type devtmpfs (rw,nosuid,noexec,size=4096k,nr_inodes=1048576,mode=755)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: tmpfs on /run type tmpfs (rw,nosuid,nodev,size=52779700k,nr_inodes=819200,mode=755)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on / type btrfs (rw,relatime,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=40095)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /.snapshots type btrfs (rw,relatime,space_cache,subvolid=258,subvol=/@/.snapshots)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /boot/grub2/i386-pc type btrfs (rw,relatime,space_cache,subvolid=260,subvol=/@/boot/grub2/i386-pc)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /boot/grub2/x86_64-efi type btrfs (rw,relatime,space_cache,subvolid=261,subvol=/@/boot/grub2/x86_64-efi)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /home type btrfs (rw,relatime,space_cache,subvolid=262,subvol=/@/home)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /opt type btrfs (rw,relatime,space_cache,subvolid=263,subvol=/@/opt)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /srv type btrfs (rw,relatime,space_cache,subvolid=264,subvol=/@/srv)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /tmp type btrfs (rw,relatime,space_cache,subvolid=265,subvol=/@/tmp)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /usr/local type btrfs (rw,relatime,space_cache,subvolid=266,subvol=/@/usr/local)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/cache type btrfs (rw,relatime,space_cache,subvolid=267,subvol=/@/var/cache)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/crash type btrfs (rw,relatime,space_cache,subvolid=268,subvol=/@/var/crash)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/lib/libvirt/images type btrfs (rw,relatime,space_cache,subvolid=269,subvol=/@/var/lib/libvirt/images)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/lib/mailman type btrfs (rw,relatime,space_cache,subvolid=271,subvol=/@/var/lib/mailman)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/lib/mariadb type btrfs (rw,relatime,space_cache,subvolid=272,subvol=/@/var/lib/mariadb)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/lib/machines type btrfs (rw,relatime,space_cache,subvolid=270,subvol=/@/var/lib/machines)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/lib/mysql type btrfs (rw,relatime,space_cache,subvolid=273,subvol=/@/var/lib/mysql)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/lib/named type btrfs (rw,relatime,space_cache,subvolid=274,subvol=/@/var/lib/named)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/lib/pgsql type btrfs (rw,relatime,space_cache,subvolid=275,subvol=/@/var/lib/pgsql)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/log type btrfs (rw,relatime,space_cache,subvolid=276,subvol=/@/var/log)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/opt type btrfs (rw,relatime,space_cache,subvolid=277,subvol=/@/var/opt)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/spool type btrfs (rw,relatime,space_cache,subvolid=278,subvol=/@/var/spool)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1042]: /dev/mapper/system-root on /var/tmp type btrfs (rw,relatime,space_cache,subvolid=279,subvol=/@/var/tmp)
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1040]: Present block devices (printed for debugging purposes):
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1043]: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1043]: sda 8:0 0 931.5G 0 disk
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1043]: └─sda1 8:1 0 931.5G 0 part
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1043]: ├─system-swap 254:0 0 2G 0 lvm [SWAP]
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1043]: └─system-root 254:1 0 40G 0 lvm /
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1043]: nvme0n1 259:0 0 953.9G 0 disk
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1043]: └─md127 9:127 0 953.7G 0 raid0
Jan 16 03:34:17 openqaworker10 openqa-establish-nvme-setup[1040]: Stopping current RAID "/dev/md/openqa"
Jan 16 03:34:17 openqaworker10 systemd[1]: openqa_nvme_format.service: Main process exited, code=exited, status=1/FAILURE
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1044]: mdadm: stopped /dev/md/openqa
Jan 16 03:34:17 openqaworker10 systemd[1]: openqa_nvme_format.service: Failed with result 'exit-code'.
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1050]: └─system-root 254:1 0 40G 0 lvm /
Jan 16 03:34:17 openqaworker10 systemd[1]: Failed to start Setup NVMe before mounting it.
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1040]: Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1055]: mdadm: /dev/nvme0n1 appears to be part of a raid array:
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1055]: level=raid0 devices=1 ctime=Sun Dec 26 03:34:11 2021
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1055]: mdadm: partition table exists on /dev/nvme0n1 but will be lost or
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1055]: meaningless after creating array
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1055]: mdadm: unexpected failure opening /dev/md127
Jan 16 03:34:19 openqaworker10 openqa-establish-nvme-setup[1040]: Unable to create RAID, mdadm returned with non-zero code
Updated by nicksinger almost 3 years ago
I ran the script manually again, this time it worked:
openqaworker10:~ # /usr/local/bin/openqa-establish-nvme-setup
Current mount points (printed for debugging purposes):
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,noexec,size=4096k,nr_inodes=1048576,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,size=52779700k,nr_inodes=819200,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,size=4096k,nr_inodes=1024,mode=755)
cgroup2 on /sys/fs/cgroup/unified type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
none on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
/dev/mapper/system-root on / type btrfs (rw,relatime,space_cache,subvolid=259,subvol=/@/.snapshots/1/snapshot)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=32,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=40095)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
/dev/mapper/system-root on /.snapshots type btrfs (rw,relatime,space_cache,subvolid=258,subvol=/@/.snapshots)
/dev/mapper/system-root on /boot/grub2/i386-pc type btrfs (rw,relatime,space_cache,subvolid=260,subvol=/@/boot/grub2/i386-pc)
/dev/mapper/system-root on /boot/grub2/x86_64-efi type btrfs (rw,relatime,space_cache,subvolid=261,subvol=/@/boot/grub2/x86_64-efi)
/dev/mapper/system-root on /home type btrfs (rw,relatime,space_cache,subvolid=262,subvol=/@/home)
/dev/mapper/system-root on /opt type btrfs (rw,relatime,space_cache,subvolid=263,subvol=/@/opt)
/dev/mapper/system-root on /srv type btrfs (rw,relatime,space_cache,subvolid=264,subvol=/@/srv)
/dev/mapper/system-root on /tmp type btrfs (rw,relatime,space_cache,subvolid=265,subvol=/@/tmp)
/dev/mapper/system-root on /usr/local type btrfs (rw,relatime,space_cache,subvolid=266,subvol=/@/usr/local)
/dev/mapper/system-root on /var/cache type btrfs (rw,relatime,space_cache,subvolid=267,subvol=/@/var/cache)
/dev/mapper/system-root on /var/crash type btrfs (rw,relatime,space_cache,subvolid=268,subvol=/@/var/crash)
/dev/mapper/system-root on /var/lib/libvirt/images type btrfs (rw,relatime,space_cache,subvolid=269,subvol=/@/var/lib/libvirt/images)
/dev/mapper/system-root on /var/lib/mailman type btrfs (rw,relatime,space_cache,subvolid=271,subvol=/@/var/lib/mailman)
/dev/mapper/system-root on /var/lib/mariadb type btrfs (rw,relatime,space_cache,subvolid=272,subvol=/@/var/lib/mariadb)
/dev/mapper/system-root on /var/lib/machines type btrfs (rw,relatime,space_cache,subvolid=270,subvol=/@/var/lib/machines)
/dev/mapper/system-root on /var/lib/mysql type btrfs (rw,relatime,space_cache,subvolid=273,subvol=/@/var/lib/mysql)
/dev/mapper/system-root on /var/lib/named type btrfs (rw,relatime,space_cache,subvolid=2[115338.217481] md127: detected capacity change from 0 to 1024073924608
74,subvol=/@/var/lib/named)
/dev/mapper/system-root on /var/lib/pgsql type btrfs (rw,relatime,space_cache,subvolid=275,subvol=/@/var/lib/pgsql)
/dev/mapper/system-root on /var/log type btrfs (rw,relatime,space_cache,subvolid=276,subvol=/@/var/log)
/dev/mapper/system-root on /var/opt type btrfs (rw,relatime,space_cache,subvolid=277,subvol=/@/var/opt)
/dev/mapper/system-root on /var/spool type btrfs (rw,relatime,space_cache,subvolid=278,subvol=/@/var/spool)
/dev/mapper/system-root on /var/tmp type btrfs (rw,relatime,space_cache,subvolid=279,subvol=/@/var/tmp)
Present block devices (printed for debugging purposes):
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931.5G 0 disk
└─sda1 8:1 0 931.5G 0 part
├─system-swap 254:0 0 2G 0 lvm [SWAP]
└─system-root 254:1 0 40G 0 lvm /
nvme0n1 259:0 0 953.9G 0 disk
└─system-root 254:1 0 40G 0 lvm /
Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1
mdadm: /dev/nvme0n1 appears to be part of a raid array:
level=raid0 devices=1 ctime=Sun Dec 26 03:34:11 2021
mdadm: partition table exists on /dev/nvme0n1 but will be lost or
meaningless after creating array
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/openqa started.
Status for RAID0 "/dev/md/openqa"
md127 : active raid0 nvme0n1[0]
ARRAY /dev/md/openqa metadata=1.2 name=openqaworker10:openqa UUID=b76765cb:acc4d288:a4edd6dd:45fdd9d9
Creating ext2 filesystem on RAID0 "/dev/md/openqa"
mke2fs 1.43.8 (1-Jan-2018)
/dev/md/openqa contains a ext2 file system
last mounted on /var/lib/openqa on Sun Dec 26 03:36:26 2021
Discarding device blocks: done
Creating filesystem with 250018048 4k blocks and 62504960 inodes
Filesystem UUID: 99f6c618-e17e-4d0e-b448-43d4051a916e
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done
Updated by nicksinger almost 3 years ago
- Status changed from In Progress to Feedback
- Target version deleted (
Ready)
System started fine after a manual run of the script. @okurz any idea for immediate actions? Otherwise I'd consider this task done and would look for further failures. Until now the service seems pretty stable and already has retries and such build in. I wouldn't know how to improve it further.
Updated by livdywan almost 3 years ago
I'm able to login via ssh, and it looks to be working (pun intended)
Actions