action #161426
opencoordination #161414: [epic] Improved salt based infrastructure management
incomplete config files on OSD due to salt - introduce post-deploy monitoring steps like in osd-deployment but in salt-states-openqa
0%
Description
Motivation¶
See #161324 . Why did the salt states pipelines end with success when the salt high state was never reported to be successfully applied to the openqa.suse.de salt minion (openqa.suse.de is not mentioned in the list of minions where the state was applied but the pipeline still ended)? We do not know yet but this should help us in the future to spot errors quicker in case similar problems return. Maybe the problem is related to how we run salt over ssh from that minion openqa.suse.de and potentially the exit code from salt was never propagated but the command in bash just ended prematurely? Introduce post-deploy monitoring steps like in osd-deployment but in salt-states-openqa
Acceptance criteria¶
- AC1: salt-states-openqa CI pipelines have post-deploy monitoring steps for grafana alerts like in osd-deployment
Suggestions¶
- Implement monitoring checks like https://gitlab.suse.de/openqa/osd-deployment/-/blob/master/.gitlab-ci.yml?ref_type=heads#L247 but in https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/.gitlab-ci.yml?ref_type=heads probably best by moving the commonly needed code to https://gitlab.suse.de/openqa/ci/ ?
Updated by okurz 7 months ago
- Related to action #161324: Conduct "lessons learned" with Five Why analysis for "osd not accessible, 502 Bad Gateway" added