tickets #62204
closednew VMs are "physical" for salt
100%
Description
Some newly created VMs get recognized as physical
instead of kvm
by salt. Besides being funny (and causing a slightly different setup, see pillar/virtual/*
), this results in an error when trying to run a highstate:
highstate
local:
Data failed to compile:
----------
Pillar failed to render with the following messages:
----------
Specified SLS 'virt_cluster.atreju.physical' in environment 'production' is not available on the salt master
Affected VMs are:
# salt \* grains.get virtual | grep -B1 physical # output slightly beautified
nue-ns1.infra.opensuse.org:
physical
nue-ns2.infra.opensuse.org:
physical
jekyll.infra.opensuse.org:
physical
provo-gate.infra.opensuse.org:
physical
while all other VMs show kvm
for grains.get virtual
Lars, IIRC you mentioned that you used a slightly different libvirt config when setting up these VMs. Could this be the reason why the new VMs look like physical machines to salt?
(We'lll probably end up reporting this as a salt bug, but let's first collect some information so that the salt devs can understand and fix this ;-)
Updated by lrupp over 4 years ago
~> salt 'jekyll*' grains.get manufacturer
jekyll.infra.opensuse.org:
QEMU
~> systemd-detect-virt
kvm
???
Updated by cboltz over 4 years ago
It seems some VMs (nue-ns1
and jekyll
) fixed themself (no idea why) - but nue-ns2
still thinks it's made of iron, and forum.i.o.o
also joined the iron club.
However, grains.get manufacturer
results in QEMU
and systemd-detect-virt
says kvm
for both nue-ns2
and forum
.
provo-gate
currently results in Minion did not return. [Not connected]
which is something you might want to check and fix ;-)
Updated by cboltz over 4 years ago
The code for setting the virtual
grain is in /usr/lib/python3.6/site-packages/salt/grains/core.py
in the _virtual
function.
The default is grains = {'virtual': 'physical'}
which is set at the beginning of the function.
Another default is _cmds = ['systemd-detect-virt', 'virt-what', 'dmidecode']
- that's the list of check commands.
However, if virt-what
exists (which is only true on a few machines, including those mentioned in this ticket), then _cmds
gets set to only virt-what
:
if not salt.utils.platform.is_windows() and osdata['kernel'] not in skip_cmds:
if salt.utils.path.which('virt-what'):
_cmds = ['virt-what']
The machines that show up as physical
have one thing in common - if you manually call virt-what
, you get
virt-what: virt-what-cpuid-helper program not found in $PATH
Therefore it's not too surprising that the default grains[virtual] = physical
doesn't get changed.
(As a funny sidenote - after reading the source, I noticed that /v/l/salt/minion shows an error message about this ;-)
For completeness: Some other machines (daffy1, daffy2, matomo, jekyll, baloo, svn, os-rt, riesling3, mybackup-opensuse) have a working virt-what that returns kvm
as expected.
Everything above means that the problem is in virt-what
, not salt itsself.
virt-what
is a script which changes PATH to include /usr/lib/
, so in theory, it should find /usr/lib/virt-what-cpuid-helper
- but in practise which virt-what-cpuid-helper
fails :-(
The reason for that is even more entertaining - I added a plain which virt-what-cpuid-helper
to the script (without 2>/dev/null
as the script originally does when trying to write the path into a variable, and the result is (line 90 is the which
call I added):
/usr/sbin/virt-what: line 90: which: command not found
Unsurprisingly, the machines listed in this ticket also don't have which
installed. There are some more machines without which
(especially caasp*
) - but those also don't have virt-what installed.
After so much debugging, having some good news would be good, right?
One zypper in which
(on forum.i.o.o) later, virt-what
reports kvm
, and the virtual
grain now also says kvm
:-) (salt-minion might need a restart (or time) to update the grains)
Oh, and to make things even more funny - the RPM changelog of which
says it was split off util-linux
in January 2013, but OTOH virt-what still requires util-linux
. I'm quite surprised that nobody noticed this before...
I reported this as https://bugzilla.opensuse.org/show_bug.cgi?id=1161850 and submitted MR 320 on gitlab to get the package installed on all 15.x VMs.
Updated by lrupp over 4 years ago
- Status changed from New to In Progress
- Assignee changed from lrupp to cboltz
- % Done changed from 0 to 100
Updated by cboltz over 4 years ago
- Status changed from In Progress to Closed
The workaround is merged in salt and deployed to the affected machines (except provo-gate
, which currently isn't reachable by the salt-master).
Therefore I'd say we can close this ticket as fixed, even if the "fix" is a workaround ;-)