Project

General

Profile

action #120163

Use salt grains instead of manually specifying IPs in "bridge_ip" size:M

Added by okurz 3 months ago. Updated 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
Due date:
2022-11-29
% Done:

0%

Estimated time:

Description

Motivation

See parent #116623

Acceptance criteria

  • AC1: No IP addresses are hard-coded in salt-pillars

Suggestions

  • This is probably only testable in production. Follow the README.md in salt-states. However it's easy to revert if needed.
  • Avoid using any extra scripts here and rely on grains

History

#2 Updated by mkittler 3 months ago

  • Assignee set to mkittler

I've already been creating a draft so I'll assign myself: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/770

Maybe it makes sense to test/apply this after the security zone migration to avoid doing too many things at once.

#3 Updated by cdywan 3 months ago

  • Subject changed from Use salt grains or something fancy instead of manually specifying IPs in "bridge_ip" to Use salt grains instead of manually specifying IPs in "bridge_ip" size:M
  • Description updated (diff)
  • Status changed from New to Workable

#4 Updated by okurz 3 months ago

mkittler wrote:

I've already been creating a draft so I'll assign myself: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/770

Maybe it makes sense to test/apply this after the security zone migration to avoid doing too many things at once.

mkittler well, I like that you would like to prevent disruptions but the challenge is that we need to update the addresses anyway for the migration where not done already and if we would have FQDNs for the hosts that still need migration we would get that part "for free".

#5 Updated by mkittler 2 months ago

  • Status changed from Workable to In Progress

Ok, I'll try it out now on OSD then.


EDIT: The change generally doesn't break everything, e.g. sudo salt --no-color --state-output=changes -C 'G@roles:worker' cmd.run 'grep remote_ip /etc/wicked/scripts/gre_tunnel_preup.sh' still shows the IPs as expected. The next step would be removing bridge_ip from the pillars (e.g. in a few places as a start) to see whether then the fallback to use FQDNs works.

EDIT: It doesn't work. I cannot add anything to the salt mine and don't know how to continue. It appears like my change to mine.sls is completely ignored despite having the change now also in the pillars repo.

#6 Updated by openqa_review 2 months ago

  • Due date set to 2022-11-29

Setting due date based on mean cycle time of SUSE QE Tools

#7 Updated by mkittler 2 months ago

I'm stuck adding new information to the salt mine. I'll try to research what I might be missing or ask Nick tomorrow.

#8 Updated by okurz 2 months ago

In the meantime mkittler could fix the issue with the help from nsinger, highly appreciated. https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/770 merged. Will you prepare an according change to our salt pillars now removing the bridge_ip settings?

#9 Updated by okurz 2 months ago

https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/460 merged 32 minutes ago. Please a sanity check, e.g. salt cmd.run

#10 Updated by mkittler 2 months ago

  • Status changed from In Progress to Resolved

The deployment pipeline has passed, the salt mine changes are effective and the config looks still sane.

#11 Updated by mkittler 2 months ago

  • Status changed from Resolved to Feedback

Looks like now there are some template rendering issues in the relevant code:

QA-Power8-5-kvm.qa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
malbec.arch.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker11.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker3.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker6.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker5.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker8.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker9.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker12.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0
worker10.oqa.suse.de:
    Data failed to compile:
----------
    Rendering SLS 'base:openqa.openvswitch' failed: Jinja variable list object has no element 0

(from https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1252150)

#12 Updated by mkittler 2 months ago

On OSD I've got only two occurences of "Data failed to compile" anymore and no "Rendering SLS" error:

martchus@openqa:~> sudo salt -l error --state-output=changes \* state.apply
worker5.oqa.suse.de:
    Data failed to compile:
----------
    The function "state.highstate" is running as PID 32342 and was started at 2022, Nov 22 15:40:32.201112 with jid 20221122154032201112
grenache-1.qa.suse.de:
    Data failed to compile:
----------
    The function "state.highstate" is running as PID 585787 and was started at 2022, Nov 22 15:39:52.397112 with jid 20221122153952397112
…
Summary for openqaworker-arm-2.suse.de
--------------
Succeeded: 397 (changed=4)
Failed:      0
--------------
Total states run:     397
Total run time:    82.397 s
…
ERROR: Minions returned with non-zero exit code

When I tried it again I've got:

martchus@openqa:~> sudo salt -l error --state-output=changes \* state.apply
openqaworker-arm-2.suse.de:
    Data failed to compile:
----------
    The function "state.highstate" is running as PID 48302 and was started at 2022, Nov 22 15:44:10.570942 with jid 20221122154410570942
…
Summary for worker5.oqa.suse.de
--------------
Succeeded: 498 (changed=4)
Failed:      0
--------------
Total states run:     498
Total run time:    50.702 s
…
Summary for grenache-1.qa.suse.de
--------------
Succeeded: 577 (changed=4)
Failed:      0
--------------
Total states run:     577
Total run time:    50.056 s
…
ERROR: Minions returned with non-zero exit code

So the result is the same except that this time a completely different worker runs into the error (and the ones that previously ran into it no longer run into it). I doubt that issue it the same as the "Rendering SLS" one which I wanted to reproduce (but apparently cannot reproduce). I've nevertheless looked into the issue but only found old bug reports that are likely not relevant (e.g. https://github.com/saltstack/salt/issues/16432 and https://github.com/saltstack/salt/issues/34362).

#13 Updated by mkittler 2 months ago

  • Status changed from Feedback to Resolved

We've discussed that in the unblock meeting.

About the first issue (#120163#note-11): It only happened once and could not be reproduced. It is likely a general issue with the mine that at some point was apparently not fully populated. So we can likely close this ticket for now (that only introduced yet another use of the mine) but keep it in mind should we see the problem again.

About the second issue (#120163#note-12): It is really unrelated and shouldn't block this ticket from being resolved. It is likely happening because the minion is still busy at the time one attempts to apply states again. I don't really understand it because on the previous run e.g. arm-2 succeeds (no timeout or anything) and on the next run it runs into the problem.

#14 Updated by mkittler 2 months ago

  • Related to action #120921: [alert] Salt states fail to compile with "Rendering SLS 'base:openqa.openvswitch' failed: Jinja error: argument of type 'NoneType' is not iterable" size:M added

#15 Updated by mkittler 2 months ago

There's yet another problem with rendering that template. However, it now fails even earlier so I've been creating a separate ticket (and added it as related ticket).

#16 Updated by mkittler 2 months ago

The issue #120921 was really not related at all.

#17 Updated by mkittler 2 months ago

  • Related to deleted (action #120921: [alert] Salt states fail to compile with "Rendering SLS 'base:openqa.openvswitch' failed: Jinja error: argument of type 'NoneType' is not iterable" size:M)

Also available in: Atom PDF