Actions
action #177366
closedcoordination #161414: [epic] Improved salt based infrastructure management
osd deployment "test.ping" check runs into gitlab CI timeout
Status:
Resolved
Priority:
Low
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
From https://gitlab.suse.de/openqa/osd-deployment/-/jobs/3824667#L36
++ retry ssh openqa.suse.de 'sudo salt \* test.ping'
$ retry ssh $TARGET "sudo salt \* test.ping"
...[…]..Terminated
.Retrying up to 3 more times after sleeping 3s …
.Terminated
+++ kill %1
WARNING: step_script could not run to completion because the timeout was exceeded. For more control over job and script timeouts see: https://docs.gitlab.com/ee/ci/runners/configure_runners.html#set-script-and-after_script-timeouts
ERROR: Job failed: execution took longer than 2h0m0s seconds
In #175407 we set our global salt timeout on salt master to a much higher number. That means that also in a simple test.ping
when we don't get a reply we use that very long timeout. Combined with up to 3 retries that means we exceed the 2h gitlab CI timeout and have no feedback on which hosts did not respond.
Acceptance criteria¶
- AC1: test.ping returns reasonably fast
- AC2: state.apply still uses the much longer timeout
Suggestions¶
- We could explicitly apply a longer timeout on either
test.ping
e.g.salt -t 5
or longer timeouts onstate.apply
. Maybe it's also possible to use custom timeouts per command?
Updated by okurz 14 days ago
- Copied from action #175407: salt state for machine monitor.qe.nue2.suse.org was broken for almost 2 months, nothing was alerting us size:S added
Actions