action #94994
closedTracebacks in the journal for salt-minion size:M
0%
Description
The salt-minion on machines like backup.qa.suse.de contains quite some tracebacks - even though it doesn't seem to crash.
Jul 01 14:33:34 backup-vm systemd[1]: Started The Salt Minion.
Jul 01 14:33:36 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file default_interface.default_interface in function <function default_interface at 0x7fa1e9a4b598>, error:
Jul 01 14:33:36 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:36 backup-vm salt-minion[2390]: File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:36 backup-vm salt-minion[2390]: ret = funcs[key](**kwargs)
Jul 01 14:33:36 backup-vm salt-minion[2390]: File "/var/cache/salt/minion/extmods/grains/default_interface.py", line 10, in default_interface
Jul 01 14:33:36 backup-vm salt-minion[2390]: internalip = salt.modules.network.ip_addrs(cidr='10.160.0.0/13')[0]
Jul 01 14:33:36 backup-vm salt-minion[2390]: File "/usr/lib/python3.6/site-packages/salt/modules/network.py", line 1278, in ip_addrs
Jul 01 14:33:36 backup-vm salt-minion[2390]: addrs = __utils__["network.ip_addrs"](
Jul 01 14:33:36 backup-vm salt-minion[2390]: NameError: name '__utils__' is not defined
Jul 01 14:33:36 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function <function check_output at 0x7fa1fae79730>, error:
Jul 01 14:33:36 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:36 backup-vm salt-minion[2390]: File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:36 backup-vm salt-minion[2390]: ret = funcs[key](**kwargs)
Jul 01 14:33:36 backup-vm salt-minion[2390]: File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Jul 01 14:33:36 backup-vm salt-minion[2390]: **kwargs).stdout
Jul 01 14:33:36 backup-vm salt-minion[2390]: File "/usr/lib64/python3.6/subprocess.py", line 423, in run
Jul 01 14:33:36 backup-vm salt-minion[2390]: with Popen(*popenargs, **kwargs) as process:
Jul 01 14:33:36 backup-vm salt-minion[2390]: TypeError: __init__() missing 1 required positional argument: 'args'
Jul 01 14:33:41 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file default_interface.default_interface in function <function default_interface at 0x7fa1e9a4b840>, error:
Jul 01 14:33:41 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:41 backup-vm salt-minion[2390]: File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:41 backup-vm salt-minion[2390]: ret = funcs[key](**kwargs)
Jul 01 14:33:41 backup-vm salt-minion[2390]: File "/var/cache/salt/minion/extmods/grains/default_interface.py", line 10, in default_interface
Jul 01 14:33:41 backup-vm salt-minion[2390]: internalip = salt.modules.network.ip_addrs(cidr='10.160.0.0/13')[0]
Jul 01 14:33:41 backup-vm salt-minion[2390]: File "/usr/lib/python3.6/site-packages/salt/modules/network.py", line 1278, in ip_addrs
Jul 01 14:33:41 backup-vm salt-minion[2390]: addrs = __utils__["network.ip_addrs"](
Jul 01 14:33:41 backup-vm salt-minion[2390]: NameError: name '__utils__' is not defined
Jul 01 14:33:41 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function <function check_output at 0x7fa1fae79730>, error:
Jul 01 14:33:41 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:41 backup-vm salt-minion[2390]: File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:41 backup-vm salt-minion[2390]: ret = funcs[key](**kwargs)
Jul 01 14:33:41 backup-vm salt-minion[2390]: File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Jul 01 14:33:41 backup-vm salt-minion[2390]: **kwargs).stdout
Jul 01 14:33:41 backup-vm salt-minion[2390]: File "/usr/lib64/python3.6/subprocess.py", line 423, in run
Jul 01 14:33:41 backup-vm salt-minion[2390]: with Popen(*popenargs, **kwargs) as process:
Jul 01 14:33:41 backup-vm salt-minion[2390]: TypeError: __init__() missing 1 required positional argument: 'args'
Acceptance Criteria¶
- AC1: No critical errors or exceptions in salt minion
Acceptance tests¶
- AT1-1: Run
sudo salt \* cmd.run 'grep CRITICAL /var/log/salt/minion'
on osd
Suggestions¶
- Investigate the cause of these errors
- Try to fix the problems in the files from https://gitlab.suse.de/openqa/salt-states-openqa/-/tree/master/_grains
Updated by dheidler about 3 years ago
- Related to action #94555: backup.qa.suse.de needs to be upgraded to Leap 15.2 (size:S) added
Updated by okurz about 3 years ago
- Priority changed from Normal to High
- Target version set to Ready
Updated by okurz about 3 years ago
- Subject changed from Trackebacks in the journal for salt-minion to Tracebacks in the journal for salt-minion
Updated by livdywan about 3 years ago
- Description updated (diff)
- Status changed from New to Workable
Updated by okurz about 3 years ago
- Description updated (diff)
- Target version deleted (
Ready)
Updated by livdywan about 3 years ago
- Subject changed from Tracebacks in the journal for salt-minion to Tracebacks in the journal for salt-minion size:M
Updated by mkittler about 3 years ago
There are two different kinds of tracebacks and they're occurring on all hosts I've checked, including openqa
(web UI host) and grenache-1
and powerqaworker-qam-1
(PowerPC workers) and the monitoring and backup hosts.
Jul 07 13:59:11 openqa salt-minion[1332]: [CRITICAL] Failed to load grains defined in grain file default_interface.default_interface in function <function default_interface at 0x7f4f15382f28>, error:
Jul 07 13:59:11 openqa salt-minion[1332]: Traceback (most recent call last):
Jul 07 13:59:11 openqa salt-minion[1332]: File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 07 13:59:11 openqa salt-minion[1332]: ret = funcs[key](**kwargs)
Jul 07 13:59:11 openqa salt-minion[1332]: File "/var/cache/salt/minion/extmods/grains/default_interface.py", line 10, in default_interface
Jul 07 13:59:11 openqa salt-minion[1332]: internalip = salt.modules.network.ip_addrs(cidr='10.160.0.0/13')[0]
Jul 07 13:59:11 openqa salt-minion[1332]: File "/usr/lib/python3.6/site-packages/salt/modules/network.py", line 1278, in ip_addrs
Jul 07 13:59:11 openqa salt-minion[1332]: addrs = __utils__["network.ip_addrs"](
Jul 07 13:59:11 openqa salt-minion[1332]: NameError: name '__utils__' is not defined
Jul 07 13:59:11 openqa salt-minion[1332]: [CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function <function check_output at 0x7f4f291e0730>, error:
Jul 07 13:59:11 openqa salt-minion[1332]: Traceback (most recent call last):
Jul 07 13:59:11 openqa salt-minion[1332]: File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 07 13:59:11 openqa salt-minion[1332]: ret = funcs[key](**kwargs)
Jul 07 13:59:11 openqa salt-minion[1332]: File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Jul 07 13:59:11 openqa salt-minion[1332]: **kwargs).stdout
Jul 07 13:59:11 openqa salt-minion[1332]: File "/usr/lib64/python3.6/subprocess.py", line 423, in run
Jul 07 13:59:11 openqa salt-minion[1332]: with Popen(*popenargs, **kwargs) as process:
Jul 07 13:59:11 openqa salt-minion[1332]: TypeError: __init__() missing 1 required positional argument: 'args'
Updated by mkittler about 3 years ago
SR to fix the first backtrace: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/521 - MERGED, now only the 2nd backtrace is left within the log
SR to fix the second backtrace: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/522 - MERGED, salt -l debug … grains.items
looks good but the backtrace is still shown
Updated by mkittler about 3 years ago
- Status changed from Workable to In Progress
The 2nd backtrace is still shown when running e.g. salt -l debug backup.qa.suse.de saltutil.sync_grains
(or … state.apply
) but not when running salt -l debug backup.qa.suse.de grains.items
. grain.items
even shows the value of ppc_powervm
correctly¹.
¹ True
for VMs and False
otherwise; the grain is actually not PowerPC-specific apart from the fact that it is only used within a PowerPC-related condition.
Updated by mkittler about 3 years ago
Ah, one has to call salt-call --local saltutil.sync_grains
; otherwise … grain.items
won't reevaluate the grains and also won't show the error. SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/523
Updated by openqa_review about 3 years ago
- Due date set to 2021-07-22
Setting due date based on mean cycle time of SUSE QE Tools
Updated by mkittler about 3 years ago
Looks like it still doesn't work. My test with salt-call --local
on openqaworker12
was apparently not good enough - although I even added an extra item to the grain in incremented its value on every try to see whether the grain is actually re-evaluated.
Any idea how to test this like in production? I can of course call salt -l debug backup.qa.suse.de saltutil.sync_grains
on OSD and then I see the backtrace being logged on backup.qa.suse.de
. I'm just wondering what file on OSD I needed to change manually to test different versions of the Grain.
By the way, when searching about this error I found https://bugs.python.org/issue16520 but I'm not sure whether it is related.
Updated by okurz about 3 years ago
mkittler wrote:
Looks like it still doesn't work
How does it not work?
Any idea how to test this like in production? I can of course call
salt -l debug backup.qa.suse.de saltutil.sync_grains
on OSD and then I see the backtrace being logged onbackup.qa.suse.de
. I'm just wondering what file on OSD I needed to change manually to test different versions of the Grain.
AFAIU the files on osd in /srv/salt are synced to a cache directory on each minion. With your above call you should be able to sync grains. If the error is only triggered by doing something more than just syncing you can also combine calls, e.g. salt -l debug backup.qa.suse.de saltutil.sync_grains,state.apply
Updated by mkittler about 3 years ago
Looks like it still doesn't work
One still gets the backtrace.
AFAIU the files on osd in /srv/salt are synced to a cache directory on each minion.
That seems to be correct. When changing the grain within that directory manually it gets synced via sync_grains
. So this way I can test it.
Btw, my theory why the error occurs is that importing check_output
from the subprocess
module directly into the grain's own namespace/module/package (however Python calls this) causes the grain loader to think check_output
is also a grain function and invokes it (with inappropriate/missing arguments). But if that's really the only problem my fix should work and the backtrace disappear.
The weird thing is that when modifying the grain, e.g. to include an additional field, this is shown perfectly fine (as well as the actual field we're interested in; I only introduce a new field to test whether changes are visible at all). However the critical error remains:
[CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function
…
So it looks like it works besides this critical error. Maybe the new/updated grain is evaluated only in addition to the previous one?
Updated by mkittler about 3 years ago
Maybe the new/updated grain is evaluated only in addition to the previous one?
Looks like that's really it. If I disable the grain cache on backup.qa.suse.de
(within /etc/salt/minion
) and restart salt-minion
the error is no longer logged.
Updated by mkittler about 3 years ago
- Status changed from In Progress to Feedback
Apparently just restarting salt-minion
also works. Not sure whether it is a good idea to restart salt-minion via salt itself so I'd just wait until all machines are rebooted anyways (or the cache is somehow invalidated).
Updated by okurz about 3 years ago
Come on, don't be scared! :) I just did ssh osd "sudo salt \* cmd.run 'systemctl restart salt-minion'"
and it was turning out fine. I think salt is coping with the restart by triggering a command and then polling later for the result or waiting for the salt minion to talk back, regardless if it's a new process by then.
According to ssh osd "sudo salt \* cmd.run 'grep \"2021-07-09.*CRITICAL\" /var/log/salt/minion'"
we are good.