Project

General

Profile

Actions

action #104016

closed

Broken VirtualBox kernel module on x86_64 OSD workers

Added by livdywan over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2021-12-15
Due date:
% Done:

0%

Estimated time:

Description

Observation

x89_64 OSD workers seem to use the VirtualBox kernel module but it's not working:

sudo journalctl -fu vboxdrv.service
-- Logs begin at Tue 2021-12-07 17:52:31 CET. --
Dez 15 07:23:11 openqaworker13 systemd[1]: Starting VirtualBox Linux kernel 
module...
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30813]: vboxdrv.sh: Starting 
VirtualBox services.
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30818]: Starting VirtualBox 
services.
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30823]: Sources for building host 
modules are not present,
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30823]: Use 'sudo zypper install 
virtualbox-host-source kernel-devel kernel-default-devel' to install them. 
Quitting ..
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30813]: vboxdrv.sh: failed: modprobe 
vboxdrv failed. Please use 'dmesg' to find out why.
Dez 15 07:23:11 openqaworker13 vboxdrv.sh[30830]: failed: modprobe vboxdrv 
failed. Please use 'dmesg' to find out why.
Dez 15 07:23:11 openqaworker13 systemd[1]: vboxdrv.service: Control process 
exited, code=exited, status=1/FAILURE
Dez 15 07:23:11 openqaworker13 systemd[1]: vboxdrv.service: Failed with result 
'exit-code'.
Dez 15 07:23:11 openqaworker13 systemd[1]: Failed to start VirtualBox Linux 
kernel module.

Acceptance criteria

  • AC1: The VirtualBox setup is working like before the upgrade to Leap 15.2

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure - action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:MResolvedlivdywan

Actions
Actions #1

Updated by livdywan over 2 years ago

  • Target version deleted (future)

Unsetting since I did not set the target version

Actions #2

Updated by mkittler over 2 years ago

  • Subject changed from Broken VirtualBox kernel module on openqaworker13 to Broken VirtualBox kernel module on openqaworker10 and 13
  • Description updated (diff)
Actions #3

Updated by okurz over 2 years ago

  • Target version set to Ready
Actions #4

Updated by mkittler over 2 years ago

  • Subject changed from Broken VirtualBox kernel module on openqaworker10 and 13 to Broken VirtualBox kernel module on various OSD workers
  • Description updated (diff)
  • Assignee set to mkittler

The unit is now failing on other worker hosts as well. Somebody is doing something without telling us.

Actions #5

Updated by mkittler over 2 years ago

  • Assignee deleted (mkittler)
  • Target version deleted (Ready)

Maybe the activation of the vboxdrv.service unit on the OSD workers is a result of the Leap 15.3 upgrade? Just because it doesn't look like any user played around with it (judging by the login history). The failing unit has been activated as dependency:

martchus@openqaworker6:~> systemctl list-dependencies --reverse vboxdrv
vboxdrv.service
● └─multi-user.target
●   └─graphical.target

The dots of the dependencies are actually green ones. So I'm wondering why the graphical target is active here. Note that VirtualBox was actually always installed as part of our Salt recipes (for x86_64) and now for some reason activating the graphical target also activated its systemd unit which is now failing.

Actions #6

Updated by mkittler over 2 years ago

  • Status changed from New to Feedback
  • Assignee set to mkittler
Actions #7

Updated by livdywan over 2 years ago

mkittler wrote:

The dots of the dependencies are actually green ones. So I'm wondering why the graphical target is active here. Note that VirtualBox was actually always installed as part of our Salt recipes (for x86_64) and now for some reason activating the graphical target also activated its systemd unit which is now failing.

Indeed I found it in worker.sls - but then why don't we have a class for it?

Actions #8

Updated by livdywan over 2 years ago

  • Related to action #99192: Upgrade osd workers and openqa-monitor to openSUSE Leap 15.3 size:M added
Actions #9

Updated by mkittler over 2 years ago

I suppose the graphical target isn't that relevant. VirtualBox has been installed intentionally via https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/038b7a7e. When looking on a worker with enough journal history I learned that vboxdrv.service actually worked before. So I'm trying to fix that. The Leap 15.3 update most likely caused this. Possibly the kernel module simply compiles again after rebooting into the new Leap 15.3 kernel. If the kernel sources are still missing we might need to install the package for them explicitly.

Actions #10

Updated by mkittler over 2 years ago

Indeed I found it in worker.sls - but then why don't we have a class for it?

Because it is simply installed on all x86_64 workers. So there's nothing special. Also note that I've been updating the ticket title. It isn't a worker-specific issue anymore.

Actions #11

Updated by mkittler over 2 years ago

  • Subject changed from Broken VirtualBox kernel module on various OSD workers to Broken VirtualBox kernel module on x86_64 OSD workers
  • Description updated (diff)
Actions #12

Updated by mkittler over 2 years ago

  • Status changed from Feedback to In Progress

Doesn't work on openqaworker9 and 8:

Dez 15 14:27:35 openqaworker9 vboxdrv.sh[20404]: Kernel modules built correctly. They will now be installed.
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: modprobe: ERROR: could not insert 'vboxnetflt': Key was rejected by service
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: modprobe: ERROR: could not insert 'vboxnetadp': Key was rejected by service
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: insmod /lib/modules/5.3.18-59.37-default/weak-updates/extra/vboxdrv.ko
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[27672]: insmod /lib/modules/5.3.18-59.37-default/weak-updates/extra/vboxdrv.ko
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[20404]: Kernel modules are installed and loaded.
Dez 15 14:27:53 openqaworker9 vboxdrv.sh[20395]: vboxdrv.sh: failed: modprobe vboxdrv failed. Please use 'dmesg' to find out why.
Dez 15 14:27:53 openqaworker9 systemd[1]: vboxdrv.service: Control process exited, code=exited, status=1/FAILURE
Dez 15 14:27:53 openqaworker9 systemd[1]: vboxdrv.service: Failed with result 'exit-code'.
Dez 15 14:27:53 openqaworker9 systemd[1]: Failed to start VirtualBox Linux kernel module.
[  648.407380] vboxdrv: Loading of module with unavailable key is rejected
[  648.422683] vboxdrv: Loading of module with unavailable key is rejected
[  648.442035] vboxdrv: Loading of module with unavailable key is rejected

It has the same kernel running and installed as e.g. openqaworker5 and also the same version of virtualbox-kmp-default.

It is likely because SecureBoot is enabled on these machines:

martchus@openqaworker9:~> sudo modprobe vboxdrv
modprobe: ERROR: could not insert 'vboxdrv': Key was rejected by service
martchus@openqaworker9:~> sudo mokutil --sb-state
SecureBoot enabled
Actions #13

Updated by mkittler over 2 years ago

The VirtualBox setup isn't used anymore so I've created a SR to remove the package: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/626

Actions #14

Updated by mkittler over 2 years ago

  • Status changed from In Progress to Feedback

The SR has been merged. I removed virtualbox from all workers:

sudo salt -C 'G@roles:worker' cmd.run 'zypper --non-interactive rm -u virtualbox'

There were actually more failing systemd units. I disabled postgresql which was failing on grenache-1. Not sure why we need a PostgreSQL database on that host. It is likely a leftover. I removed the mcelog package on openqaworker3 because the corresponding systemd unit failed. The package is not installed on any other worker and there are no mentions of it in our Salt repos so I assume it is also just a leftover.


With that we're back at 0 failing systemd units so the alert should turn off soon.

Actions #15

Updated by mkittler over 2 years ago

  • Status changed from Feedback to Resolved

The alert turned off again, I suppose everything is good now. (Except for openqaworker3 which still needs updating but that's part of the updating ticket.)

Actions

Also available in: Atom PDF