Actions
action #109301
closedopenqaworker14 + openqaworker15 sporadically get stuck on boot
Start date:
2022-03-31
Due date:
% Done:
0%
Estimated time:
Description
OBSERVATION¶
on reboot time to time this workers fails to correctly boot ending in emergency mode:
bře 08 14:34:24 openqaworker14 kernel: Loading iSCSI transport class v2.0-870.
bře 08 14:34:24 openqaworker14 systemd[1]: Finished Create Volatile Files and Directories.
bře 08 14:34:24 openqaworker14 systemd[1]: Starting Security Auditing Service...
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: nvme0n1 259:0 0 3.5T 0 disk
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: ├─nvme0n1p1 259:1 0 512M 0 part
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: ├─nvme0n1p2 259:2 0 1T 0 part /
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: └─nvme0n1p3 259:3 0 2.5T 0 part
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1557]: └─md127 9:127 0 2.5T 0 raid0
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Stopping current RAID "/dev/md/openqa"
bře 08 14:34:24 openqaworker14 systemd[1]: Finished Flush Journal to Persistent Storage.
bře 08 14:34:24 openqaworker14 kernel: i40iw_open: i40iw_open completed
bře 08 14:34:24 openqaworker14 systemd[1]: Created slice Slice /system/rdma-load-modules.
bře 08 14:34:24 openqaworker14 systemd[1]: Starting Load RDMA modules from /etc/rdma/modules/iwarp.conf...
bře 08 14:34:24 openqaworker14 systemd[1]: Starting Load RDMA modules from /etc/rdma/modules/rdma.conf...
bře 08 14:34:24 openqaworker14 kernel: ixgbe 0000:d8:00.1: Multiqueue Enabled: Rx Queue count = 63, Tx Queue count = 63 XDP Queue count = 0
bře 08 14:34:24 openqaworker14 systemd[1]: Finished Load RDMA modules from /etc/rdma/modules/iwarp.conf.
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1559]: mdadm: stopped /dev/md/openqa
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Creating RAID0 "/dev/md/openqa" on: /dev/nvme0n1p3
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: /dev/nvme0n1p3 appears to be part of a raid array:
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: level=raid0 devices=1 ctime=Mon Mar 7 10:20:52 2022
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1574]: mdadm: unexpected failure opening /dev/md127
bře 08 14:34:24 openqaworker14 openqa-establish-nvme-setup[1552]: Unable to create RAID, mdadm returned with non-zero code
bře 08 14:34:24 openqaworker14 kernel: i40iw_open: i40iw_open completed
bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_format.service: Main process exited, code=exited, status=1/FAILURE
bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_format.service: Failed with result 'exit-code'.
bře 08 14:34:24 openqaworker14 systemd[1]: Failed to start Setup NVMe before mounting it.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for /var/lib/openqa.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #1.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@1.service: Job openqa-worker-auto-restart@1.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for var-lib-openqa-share.automount.
bře 08 14:34:24 openqaworker14 systemd[1]: var-lib-openqa-share.automount: Job var-lib-openqa-share.automount/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #3.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@3.service: Job openqa-worker-auto-restart@3.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for Prepare NVMe after mounting it.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa_nvme_prepare.service: Job openqa_nvme_prepare.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for Local File Systems.
bře 08 14:34:24 openqaworker14 systemd[1]: local-fs.target: Job local-fs.target/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: local-fs.target: Triggering OnFailure= dependencies.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #2.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@2.service: Job openqa-worker-auto-restart@2.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: Dependency failed for openQA Worker #4.
bře 08 14:34:24 openqaworker14 systemd[1]: openqa-worker-auto-restart@4.service: Job openqa-worker-auto-restart@4.service/start failed with result 'dependency'.
bře 08 14:34:24 openqaworker14 systemd[1]: var-lib-openqa.mount: Job var-lib-openqa.mount/start failed with result 'dependency'.
Cause of problem is probably difference in hw configuration of this workers. Our standard workers have 1x HDD with OS and 1x name SSD with /dev/md/openQA. This workers have only one nvme SSD.
Configured as:
nvme0n1
├─nvme0n1p1 vfat FAT32 9AED-277B 506M 1% /boot/efi
├─nvme0n1p2 btrfs 5a405f4e-bd0c-46cb-a5ee-a0e976968be1 1016,5G 1% /
└─nvme0n1p3 linux_raid_member 1.2 openqaworker14:openqa 03972fdb-874d-cbec-4cb8-bca5412d90a2
└─md127 ext2 1.0 4c30279b-d757-4a97-b636-539b18bc9e22 2,3T 0% /var/lib/openqa
Updated by osukup over 2 years ago
- Related to action #104970: Add two OSD workers (openqaworker14+openqaworker15) specifically for sap-application testing size:M added
Updated by okurz over 2 years ago
- Status changed from New to Rejected
- Assignee set to okurz
- Target version set to Ready
this needs to be solved as part of #104970 as the two machines are not usable without. Please solve there.
Actions