Project

General

Profile

Actions

action #88900

closed

openqaworker13 was unreachable

Added by Xiaojing_liu over 3 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Target version:
Start date:
2021-02-22
Due date:
% Done:

0%

Estimated time:

Description

Observation

openqaworker13 was unreachable on 2021-02-22.

Acceptance criteria

  • AC1: machine is reboot-reliable, e.g. ensured by multiple reboots without failures after boot

Problem

Cannot connect to this machine ping.

using ipmitool sol activate connect to this machine, shows:

[FAILED] Failed to start save kernel crash dump.
See 'systemctl status kdump-save.service' for details.
         Starting Reload Configuration from the Real Root...
[  OK  ] Started Reload Configuration from the Real Root.
[  OK  ] Reached target Initrd File Systems.
[  OK  ] Reached target Initrd Default Target.
         Starting dracut pre-pivot and cleanup hook...
[  OK  ] Started dracut pre-pivot and cleanup hook.
         Starting Cleaning Up and Shutting Down Daemons...
[  OK  ] Stopped target Timers.
[  OK  ] Stopped dracut pre-pivot and cleanup hook.
[  OK  ] Stopped target Initrd Default Target.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Slices.
[  OK  ] Stopped target Initrd Root Device.
[  OK  ] Stopped target System Initialization.
[  OK  ] Stopped Apply Kernel Variables.
[  OK  ] Stopped Create Volatile Files and Directories.
[  OK  ] Stopped target Local File Systems.
         Unmounting /kdump/mnt1/var/crash...
         Unmounting /kdump/mnt0...
         Stopping udev Kernel Device Manager...
[  OK  ] Stopped target Sockets.
[  OK  ] Stopped Load Kernel Modules.
[  OK  ] Stopped target Paths.
[  OK  ] Stopped Dispatch Password Requests to Console Directory Watch.
[  OK  ] Stopped target Swap.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Remote File Systems (Pre).
[  OK  ] Stopped dracut initqueue hook.
[  OK  ] Stopped udev Coldplug all Devices.
[  OK  ] Stopped udev Kernel Device Manager.
[  OK  ] Unmounted /kdump/mnt1/var/crash.
[  OK  ] Started Cleaning Up and Shutting Down Daemons.
[  OK  ] Stopped Create Static Device Nodes in /dev.
[  OK  ] Stopped Create list of required sta…vice nodes for the current kernel.
[  OK  ] Stopped dracut pre-udev hook.
[  OK  ] Stopped dracut cmdline hook.
[  OK  ] Stopped dracut ask for additional cmdline parameters.
[  OK  ] Closed udev Control Socket.
[  OK  ] Closed udev Kernel Socket.
         Starting Cleanup udevd DB...
[  OK  ] Started Cleanup udevd DB.
[  OK  ] Reached target Switch Root.
         Starting Switch Root...
[FAILED] Failed to start Switch Root.
See 'systemctl status initrd-switch-root.service' for details.
         Starting Setup Virtual Console...
[  OK  ] Unmounted /kdump/mnt0.
[  OK  ] Stopped File System Check on /dev/d…3d875-0e24-49a9-8dc8-ddf8cf83f04e.
[  OK  ] Started Setup Virtual Console.
[  OK  ] Started Emergency Shell.
[  OK  ] Reached target Emergency Mode.

Generating "/run/initramfs/rdsosreport.txt"


Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.


Give root password for maintenance
(or press Control-D to continue):
[716615.675131] BTRFS info (device sda2): disk space caching is enabled
[716615.675134] BTRFS info (device sda2): has skinny extents
[716614.923484] dracut-cmdline[499]: mv: cannot stat '/etc/multipath.conf.kdump': No such file or directory
Unable to ioctl(KDSETLED) -- are you not on the console? (Inappropriate ioctl for device)
Extracting dmesg
-------------------------------------------------------------------------------

The dmesg log is saved to /kdump/mnt1/var/crash/2021-02-22-04:16/dmesg.txt.
makedumpfile Completed.
-------------------------------------------------------------------------------
Saving dump using makedumpfile
-------------------------------------------------------------------------------
Copying data                                      : [100.0 %] /           eta: 0s

The dumpfile is saved to /kdump/mnt1/var/crash/2021-02-22-04:16/vmcore.

makedumpfile Completed.
-------------------------------------------------------------------------------

Then did ipmitool power cycle. The workers on that host works well for now.

Suggestions

  • Remove from salt
  • Identify problem
  • Ensure stability with multiple reboots
  • Add back to salt after stable

Workaround

power cycle over IPMI


Related issues 2 (0 open2 closed)

Related to openQA Infrastructure - action #89497: flaky Failed systemd services alert (except openqa.suse.de)Resolvednicksinger2021-03-042021-03-31

Actions
Related to openQA Infrastructure - action #89551: NFS mount fails after boot (reproducible on some OSD workers)Resolvedmkittler2021-03-052021-03-31

Actions
Actions

Also available in: Atom PDF