action #109301
openqaworker14 + openqaworker15 sporadically get stuck on boot
Start date:
2022-03-31
Due date:
% Done:
0%
Estimated time:
Description
OBSERVATION¶
on reboot time to time this workers fails to correctly boot ending in emergency mode:
bře 08 14:34:24 openqaworker14 kernel: Loading iSCSI transport class v2.0-870. bře 08 14:34:24 openqaworker14 systemd[1]: Finished Create Volatile Files and Directories. bře 08 14:34:24 openqaworker14 systemd[1]: Starting Security Auditing Service... bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: nvme0n1 259:0 0 3.5T 0 disk bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: ├─nvme0n1p1 259:1 0 512M 0 part bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: ├─nvme0n1p2 259:2 0 1T 0 part / bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: └─nvme0n1p3 259:3 0 2.5T 0 part bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: └─md127 9:127 0 2.5T 0 raid0 bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Stopping current RAID "/dev/md/openqa" bře 08 14:34:24 openqaworker14 systemd[1]: Finished Flush Journal to Persistent Storage. bře 08 14:34:24 openqaworker14 kernel: i40iw_open: i40iw_open completed bře 08 14:34:24 openqaworker14 systemd[1]: Created slice Slice /system/rdma-load-modules. bře 08 14:34:24 openqaworker14 systemd[1]: Starting Load RDMA modules from /etc/rdma/modules/iwarp.conf... bře 08 14:34:24 openqaworker14 systemd[1]: Starting Load RDMA modules from /etc/rdma/modules/rdma.conf... bře 08 14:34:24 openqaworker14 kernel: ixgbe 0000:d8:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0 bře 08 14:34:24 openqaworker14 systemd[1]: Finished Load RDMA modules from /etc/rdma/modules/iwarp.conf. bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1559]: mdadm: stopped /dev/md/openqa bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1p3 bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: /dev/nvme0n1p3 appears to be part of a raid array: bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: level=raid0 devices=1 ctime=Mon Mar 7 10:20:52 2022 bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: unexpected failure opening /dev/md127 bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Unable to create RAID, mdadm returned with non-zero code bře 08 14:34:24 openqaworker14 kernel: i40iw_open: i40iw_open completed bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_format.service: Main process exited, code=exited, status=1/FAILURE bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_format.service: Failed with result 'exit-code'. bře 08 14:34:24 openqaworker14 systemd[1]: Failed to start Setup NVMe before mounting it. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for /var/lib/openqa. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #1. bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@1.service: Job openqa-worker-auto-restart@1.service/start failed with result 'dependency'. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for var-lib-openqa-share.automount. bře 08 14:34:24 openqaworker14 systemd[1]: var-lib-openqa-share.automount: Job var-lib-openqa-share.automount/start failed with result 'dependency'. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #3. bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@3.service: Job openqa-worker-auto-restart@3.service/start failed with result 'dependency'. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for Prepare NVMe after mounting it. bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_prepare.service: Job openqa_nvme_prepare.service/start failed with result 'dependency'. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for Local File Systems. bře 08 14:34:24 openqaworker14 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'. bře 08 14:34:24 openqaworker14 systemd[1]: local-fs.target: Triggering OnFailure= dependencies. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #2. bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@2.service: Job openqa-worker-auto-restart@2.service/start failed with result 'dependency'. bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #4. bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@4.service: Job openqa-worker-auto-restart@4.service/start failed with result 'dependency'. bře 08 14:34:24 openqaworker14 systemd[1]: var-lib-openqa.mount: Job var-lib-openqa.mount/start failed with result 'dependency'.
Cause of problem is probably difference in hw configuration of this workers. Our standard workers have 1x HDD with OS and 1x name SSD with /dev/md/openQA. This workers have only one nvme SSD.
Configured as:
nvme0n1 ├─nvme0n1p1 vfat FAT32 9AED-277B 506M 1% /boot/efi ├─nvme0n1p2 btrfs 5a405f4e-bd0c-46cb-a5ee-a0e976968be1 1016,5G 1% / └─nvme0n1p3 linux_raid_member 1.2 openqaworker14:openqa 03972fdb-874d-cbec-4cb8-bca5412d90a2 └─md127 ext2 1.0 4c30279b-d757-4a97-b636-539b18bc9e22 2,3T 0% /var/lib/openqa
Related issues
History
#1
Updated by osukup 3 months ago
- Related to action #104970: Add two OSD workers (openqaworker14+openqaworker15) specifically for sap-application testing size:M added