Project

General

Profile

Actions

action #130835

closed

salt high state fails after recent merge requests in salt pillars size:M

Added by okurz over 1 year ago. Updated over 1 year ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
-
Start date:
2023-06-14
Due date:
% Done:

0%

Estimated time:
Tags:

Description

Observation

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1629763 fails. From https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1629763/raw

.worker3.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja error: argument of type 'StrictUndefined' is not iterable
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/salt/utils/templates.py", line 502, in render_jinja_tmpl
    output = template.render(**decoded_context)
  File "/usr/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render
    return original_render(self, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render
    return self.environment.handle_exception(exc_info, True)
  File "/usr/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise
    raise value.with_traceback(tb)
  File "<template>", line 119, in top-level template code
TypeError: argument of type 'StrictUndefined' is not iterable

; line 119

---
[...]
     {%     if 'bridge_ip' in remote_conf %}
     {%         set remote_ip=remote_conf['bridge_ip'] %}
     {%     elif 'bridge_iface' in remote_conf %}
     {%         set remote_interfaces = salt['mine.get']("nodename:" + remote, 'ip4_interfaces', tgt_type='grain').values()|list %}
     {%         set remote_bridge_interface = remote_conf['bridge_iface'] %}
     {%         if remote_bridge_interface in remote_interfaces[0] %}    <======================
     {%             set remote_ip = remote_interfaces[0][remote_bridge_interface][0] %}
     {%         endif %}
     {%     endif %}
      - ovs-vsctl --may-exist add-port $bridge gre{{- loop.index }} -- set interface gre{{- loop.index }} type=gre options:remote_ip={{ remote_ip }}
     {% endfor %}
[...]
---

From ssh osd 'sudo salt --no-color --state-output=changes \* state.apply' we can reproduce that.

Suggestions


Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:MResolvedokurz2023-06-22

Actions
Actions #1

Updated by okurz over 1 year ago

I just installed salt-lint locally in a python venv and found only one minor issue within salt-pillars which is fixed by https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/543

sudo salt -l error --no-color --state-output=changes 'worker2*' state.apply showed no failures.

worker3 does, related to openvswitch.

salt-call --local -l error --no-color --state-output=changes slsutil.renderer /srv/pillar/openqa/workerconf.sls 'jinja' shows what looks like a sane document. I compared the documents before and after the two recent merge requests and found only correct changes in public cloud credentials, not related to openvswitch so I assume something else changed that is not causing problems. I assume some host is now not providing the necessary interfaces to construct all GRE tunnels.

I think the relevant code is:

     {%         set remote_interfaces = salt['mine.get']("nodename:" + remote, 'ip4_interfaces', tgt_type='grain').values()|list %}
     {%         set remote_bridge_interface = remote_conf['bridge_iface'] %}
     {%         if remote_bridge_interface in remote_interfaces[0] %}

I assume remote_interfaces[0] is invalid as remote_interfaces is empty meaning that I assume salt['mine.get']("nodename:" + remote, 'ip4_interfaces', tgt_type='grain').values() is empty.

salt -l error --no-color --state-output=changes 'worker3*' slsutil.renderer /srv/salt/openqa/openvswitch.sls 'jinja' fails with jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'bridge_ip'. Don't know how to handle that.

salt -l error --no-color --state-output=changes 'worker3*' state.sls openqa.openvswitch can reproduce the problem quicker.

I tried to change the jinja code to handle a missing ip4_interface gracefully but then it hit me: A machine is missing on purpose, worker6, see #129484 . So likely our salt codes struggle whenever a multi-machine configured worker is removed from salt keys but still present in salt pillars. So I commented the entry in /srv/pillar/openqa/workerconf.sls on osd and then the above command renders the sls file nicely and proceeds with execution.

Actions #3

Updated by okurz over 1 year ago

  • Subject changed from salt high state fails after recent merge requests in salt pillars to salt high state fails after recent merge requests in salt pillars size:M
Actions #4

Updated by okurz over 1 year ago

Yet another improvement is https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/545 to use consistent jinja commenting.

Regarding salt-lint there is https://software.opensuse.org/package/salt-lint?search_term=salt-lint in a 1.5y old version 0.8.0, upstream is 0.9.2. dheidler will create a SR for the new version to https://build.opensuse.org/package/show/systemsmanagement:saltstack/salt-lint and then to oS:Fctry and then maybe Leap.

Still monitoring still running deploy jobs.

EDIT: They seemed to have succeeded in the end.

Actions #5

Updated by openqa_review over 1 year ago

  • Due date set to 2023-06-29

Setting due date based on mean cycle time of SUSE QE Tools

Actions #6

Updated by okurz over 1 year ago

  • Status changed from In Progress to Blocked
Actions #7

Updated by okurz over 1 year ago

  • Due date changed from 2023-06-29 to 2023-07-07
  • Status changed from Blocked to Feedback

SR accepted, now in Factory, since yesterday. Trying to build a container already that has salt-lint https://build.opensuse.org/package/show/home:okurz:container/salt-lint

Actions #8

Updated by okurz over 1 year ago

still "unresolvable", don't know if I could retrigger, waiting another couple of days before a new Tumbleweed snapshot should have salt-lint installable.

Actions #9

Updated by okurz over 1 year ago

  • Status changed from Feedback to Workable

https://build.opensuse.org/package/show/home:okurz:container/salt-lint is built now. podman run --rm -it registry.opensuse.org/home/okurz/container/containers/tumbleweed:salt-lint salt-lint --help works fine now. So we can integrate that into the CI pipeline. I added an entry point in the Dockerfile so we can just call podman run --rm -it registry.opensuse.org/home/okurz/container/containers/tumbleweed:salt-lint to call salt-lint.

Actions #11

Updated by okurz over 1 year ago

  • Related to action #131249: [alert][ci][deployment] OSD deployment failed, grenache-1, worker5, worker2 salt-minion does not return, error message "No response" size:M added
Actions #12

Updated by okurz over 1 year ago

  • Status changed from Feedback to Resolved

Both MRs merged so now we have static yaml and salt syntax and lint checks for every merge request even before merge for both salt states and salt pillars

Actions

Also available in: Atom PDF