action #175851
closedcoordination #161414: [epic] Improved salt based infrastructure management
Prevent re-evaluation of "stop_and_disable_all_not_configured_workers" state on every run size:S
0%
Description
Observation¶
On salt --state-output=changes -C "G@roles:worker" state.apply
the state "stop_and_disable_all_not_configured_workers" is always executed and listed as changed. For proper idempotent evaluation the state shouldn't be evaluated.
Acceptance criteria¶
- AC1: Running
state.apply
on multiple OSD hosts repeatedly shows no changed states
Acceptance tests¶
- AT1-1: Run
ssh openqa.suse.de "sudo nice env runs=30 count-fail-ratio salt --state-output=changes -C '*' state.apply queue=True | grep -v 'Result.*Clean"
and look for "Succeeded: $big_number" without "(changed=1)"
Suggestions¶
- Look into the already existing "unless" in https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/worker.sls?ref_type=heads#L220 and figure out what broke. Maybe related to "masked instances"?
Updated by okurz about 1 month ago
- Subject changed from Prevent re-evaluation of "stop_and_disable_all_not_configured_workers" state on every run to Prevent re-evaluation of "stop_and_disable_all_not_configured_workers" state on every run size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by jbaier_cz about 1 month ago
- Status changed from Workable to In Progress
Updated by jbaier_cz about 1 month ago
Maybe it is just as simple as https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1363
Updated by okurz about 1 month ago
merged and deployed. From the deployment job in https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3739365#L301 I still see stop_and_disable_all_not_configured_workers executed for sapworker1 but maybe somebody really applied some changes since the last run. At least diesel+petrol's output was clean. I suggest you run like 2-3 cycles of the salt state apply on at least one machine to verify and then resolve.
Updated by jbaier_cz about 1 month ago
okurz wrote in #note-6:
merged and deployed. From the deployment job in https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3739365#L301 I still see stop_and_disable_all_not_configured_workers executed for sapworker1 but maybe somebody really applied some changes since the last run. At least diesel+petrol's output was clean. I suggest you run like 2-3 cycles of the salt state apply on at least one machine to verify and then resolve.
Ah, no. That's a different problem in the same condition. We can actually only disable services (i.e. we can only make the first number lower). This is a proper fix (unless of course we want to support enabling the services): https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1364
Updated by openqa_review about 1 month ago
- Due date set to 2025-02-15
Setting due date based on mean cycle time of SUSE QE Tools