action #137984
Updated by okurz 11 months ago
## Observation
https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1900948 shows a lot of errors, e.g.
```
s390zl13.oqa.prg2.suse.org:
----------
mine.update:
True
saltutil.refresh_grains:
True
saltutil.refresh_pillar:
True
saltutil.sync_grains:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/modules/saltutil.py", line 79, in _get_top_file_envs
return __context__["saltutil._top_file_envs"]
File "/usr/lib/python3.6/site-packages/salt/loader/context.py", line 78, in __getitem__
return self.value()[item]
KeyError: 'saltutil._top_file_envs'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/minion.py", line 2110, in _thread_multi_return
...
```
but in the end the CI job passes instead of failing
## Steps to reproduce
I assume so far as long as https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1019 is not fully effective yet the problem can be reproduced by rerunning the CI job. The error message itself can be reproduced on osd with
```
salt --no-color 's390zl12*' saltutil.sync_grains
```
## Acceptance criteria
* **AC1:** Obvious errors visible in the log of the "refresh" CI job should fail the CI job
## Suggestions
* *DONE* Crosscheck if the salt command itself provides a non-zero exit code when the problem reproduces -> the command on osd `salt --no-color 's390zl12*' saltutil.sync_grains; echo $?` yields "1" from the exit code. So likely the problem is that in the CI instructions the command executed over ssh is not properly valuing the exit code of the internal command execution
* Ensure that the CI job values the exit code or error condition accordingly
## Problem
* The problem seems to be related to the compound statement. `salt \* saltutil.sync_grains,saltutil.refresh_grains ,` yields a 0 exit code, `salt \* saltutil.sync_grains` yields 1