action #68305
closedfailed systemd services on grenache-1 after bootup
0%
Description
Observation¶
grafana alerts about failed systemd services on https://stats.openqa-monitor.qa.suse.de/d/KToPYLEWz/failed-systemd-services
the specific failed alert:
# systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● systemd-modules-load.service loaded failed failed Load Kernel Modules
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
1 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
grenache-1:/home/okurz # systemctl status systemd-modules-load.service
● systemd-modules-load.service - Load Kernel Modules
Loaded: loaded (/usr/lib/systemd/system/systemd-modules-load.service; static; vendor preset: disabled)
Active: failed (Result: exit-code) since Sun 2020-06-21 03:33:37 CEST; 1 day 10h ago
Docs: man:systemd-modules-load.service(8)
man:modules-load.d(5)
Main PID: 1264 (code=exited, status=1/FAILURE)
Jun 21 03:33:37 grenache-1 systemd-modules-load[1264]: Failed to insert 'kvm_hv': No such device
Jun 21 03:33:37 grenache-1 systemd[1]: systemd-modules-load.service: Main process exited, code=exited, status=>
Jun 21 03:33:37 grenache-1 systemd[1]: Failed to start Load Kernel Modules.
Jun 21 03:33:37 grenache-1 systemd[1]: systemd-modules-load.service: Unit entered failed state.
Jun 21 03:33:37 grenache-1 systemd[1]: systemd-modules-load.service: Failed with result 'exit-code'.
due to
# cat /etc/modules-load.d/kvm.conf
kvm_hv
We need this module for kvm-on-power but not on grenache-1 which is not running any kvm tests.
Updated by okurz over 4 years ago
- Related to action #32563: [functional][u] fix salt for power added
Updated by okurz over 4 years ago
- Status changed from New to Feedback
- Assignee set to okurz
Originally this has been introduced with #32563 . Trying to distinguish grenache-1 which IIUC is a powerVM LPAR from the bare metal installations, e.g. with
okurz@openqa:~> sudo salt -l error -C 'G@roles:worker and G@osarch:ppc64le' grains.item os os_family osarch virtual kernel host cpu_flags cpu_model cpuarch
QA-Power8-4-kvm.qa.suse.de:
----------
cpu_flags:
cpu_model:
Unknown
cpuarch:
ppc64le
host:
QA-Power8-4-kvm
kernel:
Linux
os:
SUSE
os_family:
Suse
osarch:
ppc64le
virtual:
physical
grenache-1.qa.suse.de:
----------
cpu_flags:
cpu_model:
Unknown
cpuarch:
ppc64le
host:
grenache-1
kernel:
Linux
os:
SUSE
os_family:
Suse
osarch:
ppc64le
virtual:
physical
Found an approach now with custom grain to check on:
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/322
Updated by okurz over 4 years ago
- Status changed from Feedback to Resolved
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/323 as followup as the custom grains were not properly synced. I manually deleted the kvm module load file as cleanup. I rebooted grenache-1.qa manually once as the system was not executing any jobs right now and no services failed on startup.