action #130835
closedsalt high state fails after recent merge requests in salt pillars size:M
0%
Description
Observation¶
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1629763 fails. From https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1629763/raw
.worker3.oqa.suse.de:
Data failed to compile:
----------
Rendering SLS 'base:openqa.openvswitch' failed: Jinja error: argument of type 'StrictUndefined' is not iterable
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/utils/templates.py", line 502, in render_jinja_tmpl
output = template.render(**decoded_context)
File "/usr/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
return original_render(self, *args, **kwargs)
File "/usr/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
return self.environment.handle_exception(exc_info, True)
File "/usr/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
reraise(exc_type, exc_value, tb)
File "/usr/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
raise value.with_traceback(tb)
File "<template>", line 119, in top-level template code
TypeError: argument of type 'StrictUndefined' is not iterable
; line 119
---
[...]
{% if 'bridge_ip' in remote_conf %}
{% set remote_ip=remote_conf['bridge_ip'] %}
{% elif 'bridge_iface' in remote_conf %}
{% set remote_interfaces = salt['mine.get']("nodename:" + remote, 'ip4_interfaces', tgt_type='grain').values()|list %}
{% set remote_bridge_interface = remote_conf['bridge_iface'] %}
{% if remote_bridge_interface in remote_interfaces[0] %} <======================
{% set remote_ip = remote_interfaces[0][remote_bridge_interface][0] %}
{% endif %}
{% endif %}
- ovs-vsctl --may-exist add-port $bridge gre{{- loop.index }} -- set interface gre{{- loop.index }} type=gre options:remote_ip={{ remote_ip }}
{% endfor %}
[...]
---
From ssh osd 'sudo salt --no-color --state-output=changes \* state.apply'
we can reproduce that.
Suggestions¶
- Check on one worker and fix the error
- Consider improving CI checks e.g. using https://salt-lint.readthedocs.io/en/latest/
Updated by okurz over 1 year ago
I just installed salt-lint locally in a python venv and found only one minor issue within salt-pillars which is fixed by https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/543
sudo salt -l error --no-color --state-output=changes 'worker2*' state.apply
showed no failures.
worker3 does, related to openvswitch.
salt-call --local -l error --no-color --state-output=changes slsutil.renderer /srv/pillar/openqa/workerconf.sls 'jinja'
shows what looks like a sane document. I compared the documents before and after the two recent merge requests and found only correct changes in public cloud credentials, not related to openvswitch so I assume something else changed that is not causing problems. I assume some host is now not providing the necessary interfaces to construct all GRE tunnels.
I think the relevant code is:
{% set remote_interfaces = salt['mine.get']("nodename:" + remote, 'ip4_interfaces', tgt_type='grain').values()|list %}
{% set remote_bridge_interface = remote_conf['bridge_iface'] %}
{% if remote_bridge_interface in remote_interfaces[0] %}
I assume remote_interfaces[0]
is invalid as remote_interfaces
is empty meaning that I assume salt['mine.get']("nodename:" + remote, 'ip4_interfaces', tgt_type='grain').values()
is empty.
salt -l error --no-color --state-output=changes 'worker3*' slsutil.renderer /srv/salt/openqa/openvswitch.sls 'jinja'
fails with jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'bridge_ip'
. Don't know how to handle that.
salt -l error --no-color --state-output=changes 'worker3*' state.sls openqa.openvswitch
can reproduce the problem quicker.
I tried to change the jinja code to handle a missing ip4_interface gracefully but then it hit me: A machine is missing on purpose, worker6, see #129484 . So likely our salt codes struggle whenever a multi-machine configured worker is removed from salt keys but still present in salt pillars. So I commented the entry in /srv/pillar/openqa/workerconf.sls on osd and then the above command renders the sls file nicely and proceeds with execution.
Updated by okurz over 1 year ago
- Subject changed from salt high state fails after recent merge requests in salt pillars to salt high state fails after recent merge requests in salt pillars size:M
Updated by okurz over 1 year ago
Yet another improvement is https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/545 to use consistent jinja commenting.
Regarding salt-lint there is https://software.opensuse.org/package/salt-lint?search_term=salt-lint in a 1.5y old version 0.8.0, upstream is 0.9.2. dheidler will create a SR for the new version to https://build.opensuse.org/package/show/systemsmanagement:saltstack/salt-lint and then to oS:Fctry and then maybe Leap.
Still monitoring still running deploy jobs.
EDIT: They seemed to have succeeded in the end.
Updated by openqa_review over 1 year ago
- Due date set to 2023-06-29
Setting due date based on mean cycle time of SUSE QE Tools
Updated by okurz over 1 year ago
- Status changed from In Progress to Blocked
waiting for https://build.opensuse.org/request/show/1093128 first
Updated by okurz over 1 year ago
- Due date changed from 2023-06-29 to 2023-07-07
- Status changed from Blocked to Feedback
SR accepted, now in Factory, since yesterday. Trying to build a container already that has salt-lint https://build.opensuse.org/package/show/home:okurz:container/salt-lint
Updated by okurz over 1 year ago
still "unresolvable", don't know if I could retrigger, waiting another couple of days before a new Tumbleweed snapshot should have salt-lint installable.
Updated by okurz over 1 year ago
- Status changed from Feedback to Workable
https://build.opensuse.org/package/show/home:okurz:container/salt-lint is built now. podman run --rm -it registry.opensuse.org/home/okurz/container/containers/tumbleweed:salt-lint salt-lint --help
works fine now. So we can integrate that into the CI pipeline. I added an entry point in the Dockerfile so we can just call podman run --rm -it registry.opensuse.org/home/okurz/container/containers/tumbleweed:salt-lint
to call salt-lint.
Updated by okurz over 1 year ago
- Due date deleted (
2023-07-07) - Status changed from Workable to Feedback
Updated by okurz over 1 year ago
- Related to action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M added
Updated by okurz over 1 year ago
- Status changed from Feedback to Resolved
Both MRs merged so now we have static yaml and salt syntax and lint checks for every merge request even before merge for both salt states and salt pillars