Project

General

Profile

action #94994

Tracebacks in the journal for salt-minion size:M

Added by dheidler 3 months ago. Updated 3 months ago.

Status:
Resolved
Priority:
High
Assignee:
Target version:
Start date:
2021-07-01
Due date:
2021-07-22
% Done:

0%

Estimated time:

Description

The salt-minion on machines like backup.qa.suse.de contains quite some tracebacks - even though it doesn't seem to crash.

Jul 01 14:33:34 backup-vm systemd[1]: Started The Salt Minion.
Jul 01 14:33:36 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file default_interface.default_interface in function <function default_interface at 0x7fa1e9a4b598>, error:
Jul 01 14:33:36 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:36 backup-vm salt-minion[2390]:   File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:36 backup-vm salt-minion[2390]:     ret = funcs[key](**kwargs)
Jul 01 14:33:36 backup-vm salt-minion[2390]:   File "/var/cache/salt/minion/extmods/grains/default_interface.py", line 10, in default_interface
Jul 01 14:33:36 backup-vm salt-minion[2390]:     internalip = salt.modules.network.ip_addrs(cidr='10.160.0.0/13')[0]
Jul 01 14:33:36 backup-vm salt-minion[2390]:   File "/usr/lib/python3.6/site-packages/salt/modules/network.py", line 1278, in ip_addrs
Jul 01 14:33:36 backup-vm salt-minion[2390]:     addrs = __utils__["network.ip_addrs"](
Jul 01 14:33:36 backup-vm salt-minion[2390]: NameError: name '__utils__' is not defined
Jul 01 14:33:36 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function <function check_output at 0x7fa1fae79730>, error:
Jul 01 14:33:36 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:36 backup-vm salt-minion[2390]:   File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:36 backup-vm salt-minion[2390]:     ret = funcs[key](**kwargs)
Jul 01 14:33:36 backup-vm salt-minion[2390]:   File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Jul 01 14:33:36 backup-vm salt-minion[2390]:     **kwargs).stdout
Jul 01 14:33:36 backup-vm salt-minion[2390]:   File "/usr/lib64/python3.6/subprocess.py", line 423, in run
Jul 01 14:33:36 backup-vm salt-minion[2390]:     with Popen(*popenargs, **kwargs) as process:
Jul 01 14:33:36 backup-vm salt-minion[2390]: TypeError: __init__() missing 1 required positional argument: 'args'
Jul 01 14:33:41 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file default_interface.default_interface in function <function default_interface at 0x7fa1e9a4b840>, error:
Jul 01 14:33:41 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:41 backup-vm salt-minion[2390]:   File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:41 backup-vm salt-minion[2390]:     ret = funcs[key](**kwargs)
Jul 01 14:33:41 backup-vm salt-minion[2390]:   File "/var/cache/salt/minion/extmods/grains/default_interface.py", line 10, in default_interface
Jul 01 14:33:41 backup-vm salt-minion[2390]:     internalip = salt.modules.network.ip_addrs(cidr='10.160.0.0/13')[0]
Jul 01 14:33:41 backup-vm salt-minion[2390]:   File "/usr/lib/python3.6/site-packages/salt/modules/network.py", line 1278, in ip_addrs
Jul 01 14:33:41 backup-vm salt-minion[2390]:     addrs = __utils__["network.ip_addrs"](
Jul 01 14:33:41 backup-vm salt-minion[2390]: NameError: name '__utils__' is not defined
Jul 01 14:33:41 backup-vm salt-minion[2390]: [CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function <function check_output at 0x7fa1fae79730>, error:
Jul 01 14:33:41 backup-vm salt-minion[2390]: Traceback (most recent call last):
Jul 01 14:33:41 backup-vm salt-minion[2390]:   File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 01 14:33:41 backup-vm salt-minion[2390]:     ret = funcs[key](**kwargs)
Jul 01 14:33:41 backup-vm salt-minion[2390]:   File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Jul 01 14:33:41 backup-vm salt-minion[2390]:     **kwargs).stdout
Jul 01 14:33:41 backup-vm salt-minion[2390]:   File "/usr/lib64/python3.6/subprocess.py", line 423, in run
Jul 01 14:33:41 backup-vm salt-minion[2390]:     with Popen(*popenargs, **kwargs) as process:
Jul 01 14:33:41 backup-vm salt-minion[2390]: TypeError: __init__() missing 1 required positional argument: 'args'

Acceptance Criteria

  • AC1: No critical errors or exceptions in salt minion

Acceptance tests

  • AT1-1: Run sudo salt \* cmd.run 'grep CRITICAL /var/log/salt/minion' on osd

Suggestions


Related issues

Related to openQA Infrastructure - action #94555: backup.qa.suse.de needs to be upgraded to Leap 15.2 (size:S)Resolved2021-06-23

History

#1 Updated by dheidler 3 months ago

  • Related to action #94555: backup.qa.suse.de needs to be upgraded to Leap 15.2 (size:S) added

#2 Updated by okurz 3 months ago

  • Priority changed from Normal to High
  • Target version set to Ready

#3 Updated by okurz 3 months ago

  • Subject changed from Trackebacks in the journal for salt-minion to Tracebacks in the journal for salt-minion

#4 Updated by cdywan 3 months ago

  • Description updated (diff)
  • Status changed from New to Workable

#5 Updated by okurz 3 months ago

  • Description updated (diff)
  • Target version deleted (Ready)

#6 Updated by okurz 3 months ago

  • Target version set to Ready

#7 Updated by cdywan 3 months ago

  • Subject changed from Tracebacks in the journal for salt-minion to Tracebacks in the journal for salt-minion size:M

#8 Updated by mkittler 3 months ago

  • Assignee set to mkittler

#9 Updated by mkittler 3 months ago

There are two different kinds of tracebacks and they're occurring on all hosts I've checked, including openqa (web UI host) and grenache-1 and powerqaworker-qam-1 (PowerPC workers) and the monitoring and backup hosts.

Jul 07 13:59:11 openqa salt-minion[1332]: [CRITICAL] Failed to load grains defined in grain file default_interface.default_interface in function <function default_interface at 0x7f4f15382f28>, error:
Jul 07 13:59:11 openqa salt-minion[1332]: Traceback (most recent call last):
Jul 07 13:59:11 openqa salt-minion[1332]:   File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 07 13:59:11 openqa salt-minion[1332]:     ret = funcs[key](**kwargs)
Jul 07 13:59:11 openqa salt-minion[1332]:   File "/var/cache/salt/minion/extmods/grains/default_interface.py", line 10, in default_interface
Jul 07 13:59:11 openqa salt-minion[1332]:     internalip = salt.modules.network.ip_addrs(cidr='10.160.0.0/13')[0]
Jul 07 13:59:11 openqa salt-minion[1332]:   File "/usr/lib/python3.6/site-packages/salt/modules/network.py", line 1278, in ip_addrs
Jul 07 13:59:11 openqa salt-minion[1332]:     addrs = __utils__["network.ip_addrs"](
Jul 07 13:59:11 openqa salt-minion[1332]: NameError: name '__utils__' is not defined
Jul 07 13:59:11 openqa salt-minion[1332]: [CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function <function check_output at 0x7f4f291e0730>, error:
Jul 07 13:59:11 openqa salt-minion[1332]: Traceback (most recent call last):
Jul 07 13:59:11 openqa salt-minion[1332]:   File "/usr/lib/python3.6/site-packages/salt/loader.py", line 853, in grains
Jul 07 13:59:11 openqa salt-minion[1332]:     ret = funcs[key](**kwargs)
Jul 07 13:59:11 openqa salt-minion[1332]:   File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
Jul 07 13:59:11 openqa salt-minion[1332]:     **kwargs).stdout
Jul 07 13:59:11 openqa salt-minion[1332]:   File "/usr/lib64/python3.6/subprocess.py", line 423, in run
Jul 07 13:59:11 openqa salt-minion[1332]:     with Popen(*popenargs, **kwargs) as process:
Jul 07 13:59:11 openqa salt-minion[1332]: TypeError: __init__() missing 1 required positional argument: 'args'

#10 Updated by mkittler 3 months ago

SR to fix the first backtrace: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/521 - MERGED, now only the 2nd backtrace is left within the log
SR to fix the second backtrace: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/522 - MERGED, salt -l debug … grains.items looks good but the backtrace is still shown

#11 Updated by mkittler 3 months ago

  • Status changed from Workable to In Progress

The 2nd backtrace is still shown when running e.g. salt -l debug backup.qa.suse.de saltutil.sync_grains (or … state.apply) but not when running salt -l debug backup.qa.suse.de grains.items. grain.items even shows the value of ppc_powervm correctly¹.


¹ True for VMs and False otherwise; the grain is actually not PowerPC-specific apart from the fact that it is only used within a PowerPC-related condition.

#12 Updated by mkittler 3 months ago

Ah, one has to call salt-call --local saltutil.sync_grains; otherwise … grain.items won't reevaluate the grains and also won't show the error. SR: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/523

#13 Updated by openqa_review 3 months ago

  • Due date set to 2021-07-22

Setting due date based on mean cycle time of SUSE QE Tools

#14 Updated by mkittler 3 months ago

Looks like it still doesn't work. My test with salt-call --local on openqaworker12 was apparently not good enough - although I even added an extra item to the grain in incremented its value on every try to see whether the grain is actually re-evaluated.

Any idea how to test this like in production? I can of course call salt -l debug backup.qa.suse.de saltutil.sync_grains on OSD and then I see the backtrace being logged on backup.qa.suse.de. I'm just wondering what file on OSD I needed to change manually to test different versions of the Grain.

By the way, when searching about this error I found https://bugs.python.org/issue16520 but I'm not sure whether it is related.

#15 Updated by okurz 3 months ago

mkittler wrote:

Looks like it still doesn't work

How does it not work?

Any idea how to test this like in production? I can of course call salt -l debug backup.qa.suse.de saltutil.sync_grains on OSD and then I see the backtrace being logged on backup.qa.suse.de. I'm just wondering what file on OSD I needed to change manually to test different versions of the Grain.

AFAIU the files on osd in /srv/salt are synced to a cache directory on each minion. With your above call you should be able to sync grains. If the error is only triggered by doing something more than just syncing you can also combine calls, e.g. salt -l debug backup.qa.suse.de saltutil.sync_grains,state.apply

#16 Updated by mkittler 3 months ago

Looks like it still doesn't work

One still gets the backtrace.

AFAIU the files on osd in /srv/salt are synced to a cache directory on each minion.

That seems to be correct. When changing the grain within that directory manually it gets synced via sync_grains. So this way I can test it.

Btw, my theory why the error occurs is that importing check_output from the subprocess module directly into the grain's own namespace/module/package (however Python calls this) causes the grain loader to think check_output is also a grain function and invokes it (with inappropriate/missing arguments). But if that's really the only problem my fix should work and the backtrace disappear.

The weird thing is that when modifying the grain, e.g. to include an additional field, this is shown perfectly fine (as well as the actual field we're interested in; I only introduce a new field to test whether changes are visible at all). However the critical error remains:

[CRITICAL] Failed to load grains defined in grain file ppc_powervm.check_output in function
…

So it looks like it works besides this critical error. Maybe the new/updated grain is evaluated only in addition to the previous one?

#17 Updated by mkittler 3 months ago

Maybe the new/updated grain is evaluated only in addition to the previous one?

Looks like that's really it. If I disable the grain cache on backup.qa.suse.de (within /etc/salt/minion) and restart salt-minion the error is no longer logged.

#18 Updated by mkittler 3 months ago

  • Status changed from In Progress to Feedback

Apparently just restarting salt-minion also works. Not sure whether it is a good idea to restart salt-minion via salt itself so I'd just wait until all machines are rebooted anyways (or the cache is somehow invalidated).

#19 Updated by okurz 3 months ago

Come on, don't be scared! :) I just did ssh osd "sudo salt \* cmd.run 'systemctl restart salt-minion'" and it was turning out fine. I think salt is coping with the restart by triggering a command and then polling later for the result or waiting for the salt minion to talk back, regardless if it's a new process by then.

According to ssh osd "sudo salt \* cmd.run 'grep \"2021-07-09.*CRITICAL\" /var/log/salt/minion'" we are good.

#20 Updated by mkittler 3 months ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF