action #137984
closedsalt "refresh" job full of errors but CI job passes size:M
0%
Description
Observation¶
https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1900948 shows a lot of errors, e.g.
s390zl13.oqa.prg2.suse.org:
----------
mine.update:
True
saltutil.refresh_grains:
True
saltutil.refresh_pillar:
True
saltutil.sync_grains:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/modules/saltutil.py", line 79, in _get_top_file_envs
return __context__["saltutil._top_file_envs"]
File "/usr/lib/python3.6/site-packages/salt/loader/context.py", line 78, in __getitem__
return self.value()[item]
KeyError: 'saltutil._top_file_envs'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/minion.py", line 2110, in _thread_multi_return
...
but in the end the CI job passes instead of failing
Steps to reproduce¶
I assume so far as long as https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1019 is not fully effective yet the problem can be reproduced by rerunning the CI job. The error message itself can be reproduced on osd with
salt --no-color 's390zl12*' saltutil.sync_grains
Acceptance criteria¶
- AC1: Obvious errors visible in the log of the "refresh" CI job should fail the CI job
Suggestions¶
- DONE Crosscheck if the salt command itself provides a non-zero exit code when the problem reproduces -> the command on osd
salt --no-color 's390zl12*' saltutil.sync_grains; echo $?
yields "1" from the exit code. So likely the problem is that in the CI instructions the command executed over ssh is not properly valuing the exit code of the internal command execution - Ensure that the CI job values the exit code or error condition accordingly
- Make sure the exit code is still evaluated regardless of shown error messages
Problem¶
- The problem seems to be related to the compound statement.
salt \* saltutil.sync_grains,saltutil.refresh_grains ,
yields a 0 exit code,salt \* saltutil.sync_grains
yields 1
Updated by okurz about 1 year ago
- Description updated (diff)
- Status changed from New to In Progress
- Assignee set to okurz
Updated by okurz about 1 year ago
cool, even salt --no-color --failhard --hard-crash
is quite "graceful" :D
Updated by okurz about 1 year ago
- Due date set to 2023-10-27
- Status changed from In Progress to Feedback
I found two upstream issues and one related pull request:
- https://github.com/saltstack/salt/issues/9746 "Compound commands don't work."
- https://github.com/saltstack/salt/issues/42814 "Compound command execution doesn't handle multiple calls of same command"
- https://github.com/saltstack/salt/pull/47544 "Multi-command salt calls return highest exit"
The PR has been merged 5 years ago but I am not sure we actually have it in our package. In the two issues I found no solution.
https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1020
Updated by okurz about 1 year ago
- Status changed from Feedback to In Progress
reverted https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1021, was bad idea to hide the actual Traceback with "-q".
Updated by okurz about 1 year ago
- Status changed from In Progress to Feedback
Updated by okurz about 1 year ago
- Subject changed from salt "refresh" job full of errors but CI job passes to salt "refresh" job full of errors but CI job passes size:M
- Description updated (diff)
Updated by okurz about 1 year ago
Updated by okurz about 1 year ago
Updated by okurz about 1 year ago
- Due date deleted (
2023-10-27) - Status changed from Feedback to Resolved