action #104016
closed
Broken VirtualBox kernel module on x86_64 OSD workers
Added by livdywan almost 3 years ago.
Updated almost 3 years ago.
Description
Observation¶
x89_64 OSD workers seem to use the VirtualBox kernel module but it's not working:
sudo journalctl -fu vboxdrv.service
-- Logs begin at Tue 2021-12-07 17:52:31 CET. --
Dez 15 07:23:11 openqaworker13 systemd[1]: Starting VirtualBox Linux kernel
module...
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30813]: vboxdrv.sh: Starting
VirtualBox services.
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30818]: Starting VirtualBox
services.
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30823]: Sources for building host
modules are not present,
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30823]: Use 'sudo zypper install
virtualbox-host-source kernel-devel kernel-default-devel' to install them.
Quitting ..
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30813]: vboxdrv.sh: failed: modprobe
vboxdrv failed. Please use 'dmesg' to find out why.
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30830]: failed: modprobe vboxdrv
failed. Please use 'dmesg' to find out why.
Dez 15 07:23:11 openqaworker13 systemd[1]: vboxdrv.service: Control process
exited, code=exited, status=1/FAILURE
Dez 15 07:23:11 openqaworker13 systemd[1]: vboxdrv.service: Failed with result
'exit-code'.
Dez 15 07:23:11 openqaworker13 systemd[1]: Failed to start VirtualBox Linux
kernel module.
Acceptance criteria¶
- AC1: The VirtualBox setup is working like before the upgrade to Leap 15.2
Suggestions¶
- Target version deleted (
future)
Unsetting since I did not set the target version
- Subject changed from Broken VirtualBox kernel module on openqaworker13 to Broken VirtualBox kernel module on openqaworker10 and 13
- Description updated (diff)
- Target version set to Ready
- Subject changed from Broken VirtualBox kernel module on openqaworker10 and 13 to Broken VirtualBox kernel module on various OSD workers
- Description updated (diff)
- Assignee set to mkittler
The unit is now failing on other worker hosts as well. Somebody is doing something without telling us.
- Assignee deleted (
mkittler)
- Target version deleted (
Ready)
Maybe the activation of the vboxdrv.service
unit on the OSD workers is a result of the Leap 15.3 upgrade? Just because it doesn't look like any user played around with it (judging by the login history). The failing unit has been activated as dependency:
martchus@openqaworker6:~> systemctl list-dependencies --reverse vboxdrv
vboxdrv.service
● └─multi-user.target
● └─graphical.target
The dots of the dependencies are actually green ones. So I'm wondering why the graphical target is active here. Note that VirtualBox was actually always installed as part of our Salt recipes (for x86_64) and now for some reason activating the graphical target also activated its systemd unit which is now failing.
- Status changed from New to Feedback
- Assignee set to mkittler
mkittler wrote:
The dots of the dependencies are actually green ones. So I'm wondering why the graphical target is active here. Note that VirtualBox was actually always installed as part of our Salt recipes (for x86_64) and now for some reason activating the graphical target also activated its systemd unit which is now failing.
Indeed I found it in worker.sls - but then why don't we have a class for it?
- Related to action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M added
I suppose the graphical target isn't that relevant. VirtualBox has been installed intentionally via https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/038b7a7e. When looking on a worker with enough journal history I learned that vboxdrv.service
actually worked before. So I'm trying to fix that. The Leap 15.3 update most likely caused this. Possibly the kernel module simply compiles again after rebooting into the new Leap 15.3 kernel. If the kernel sources are still missing we might need to install the package for them explicitly.
Indeed I found it in worker.sls - but then why don't we have a class for it?
Because it is simply installed on all x86_64 workers. So there's nothing special. Also note that I've been updating the ticket title. It isn't a worker-specific issue anymore.
- Subject changed from Broken VirtualBox kernel module on various OSD workers to Broken VirtualBox kernel module on x86_64 OSD workers
- Description updated (diff)
- Status changed from Feedback to In Progress
Doesn't work on openqaworker9 and 8:
Dez 15 14:27:35 openqaworker9 vboxdrv.sh[20404]: Kernel modules built correctly. They will now be installed.
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: modprobe: ERROR: could not insert 'vboxnetflt': Key was rejected by service
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: modprobe: ERROR: could not insert 'vboxnetadp': Key was rejected by service
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: insmod /lib/modules/5.3.18-59.37-default/weak-updates/extra/vboxdrv.ko
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: insmod /lib/modules/5.3.18-59.37-default/weak-updates/extra/vboxdrv.ko
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[20404]: Kernel modules are installed and loaded.
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[20395]: vboxdrv.sh: failed: modprobe vboxdrv failed. Please use 'dmesg' to find out why.
Dez 15 14:27:53 openqaworker9 systemd[1]: vboxdrv.service: Control process exited, code=exited, status=1/FAILURE
Dez 15 14:27:53 openqaworker9 systemd[1]: vboxdrv.service: Failed with result 'exit-code'.
Dez 15 14:27:53 openqaworker9 systemd[1]: Failed to start VirtualBox Linux kernel module.
[ 648.407380] vboxdrv: Loading of module with unavailable key is rejected
[ 648.422683] vboxdrv: Loading of module with unavailable key is rejected
[ 648.442035] vboxdrv: Loading of module with unavailable key is rejected
It has the same kernel running and installed as e.g. openqaworker5 and also the same version of virtualbox-kmp-default.
It is likely because SecureBoot is enabled on these machines:
martchus@openqaworker9:~> sudo modprobe vboxdrv
modprobe: ERROR: could not insert 'vboxdrv': Key was rejected by service
martchus@openqaworker9:~> sudo mokutil --sb-state
SecureBoot enabled
- Status changed from In Progress to Feedback
The SR has been merged. I removed virtualbox from all workers:
sudo salt -C 'G@roles:worker' cmd.run 'zypper --non-interactive rm -u virtualbox'
There were actually more failing systemd units. I disabled postgresql
which was failing on grenache-1
. Not sure why we need a PostgreSQL database on that host. It is likely a leftover. I removed the mcelog
package on openqaworker3
because the corresponding systemd unit failed. The package is not installed on any other worker and there are no mentions of it in our Salt repos so I assume it is also just a leftover.
With that we're back at 0 failing systemd units so the alert should turn off soon.
- Status changed from Feedback to Resolved
The alert turned off again, I suppose everything is good now. (Except for openqaworker3
which still needs updating but that's part of the updating ticket.)
Also available in: Atom
PDF