action #137984: salt "refresh" job full of errors but CI job passes size:M - openQA Infrastructure (public) - openSUSE Project Management Tool

action #137984

## Observation 
 https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1900948 shows a lot of errors, e.g. 

 ``` 
 s390zl13.oqa.prg2.suse.org: 
     ---------- 
     mine.update: 
         True 
     saltutil.refresh_grains: 
         True 
     saltutil.refresh_pillar: 
         True 
     saltutil.sync_grains: 
         Traceback (most recent call last): 
           File "/usr/lib/python3.6/site-packages/salt/modules/saltutil.py", line 79, in _get_top_file_envs 
             return __context__["saltutil._top_file_envs"] 
           File "/usr/lib/python3.6/site-packages/salt/loader/context.py", line 78, in __getitem__ 
             return self.value()[item] 
         KeyError: 'saltutil._top_file_envs' 
        
         During handling of the above exception, another exception occurred: 
        
         Traceback (most recent call last): 
           File "/usr/lib/python3.6/site-packages/salt/minion.py", line 2110, in _thread_multi_return 
 ... 
 ``` 

 but in the end the CI job passes instead of failing 

 ## Steps to reproduce 
 I assume so far as long as https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1019 is not fully effective yet the problem can be reproduced by rerunning the CI job. The error message itself can be reproduced on osd with 

 ``` 
 salt --no-color 's390zl12*' saltutil.sync_grains 
 ``` 

 ## Acceptance criteria 
 * **AC1:** Obvious errors visible in the log of the "refresh" CI job should fail the CI job 

 ## Suggestions 
 * *DONE* Crosscheck if the salt command itself provides a non-zero exit code when the problem reproduces -> the command on osd `salt --no-color 's390zl12*' saltutil.sync_grains; echo $?` yields "1" from the exit code. So likely the problem is that in the CI instructions the command executed over ssh is not properly valuing the exit code of the internal command execution 
 * Ensure that the CI job values the exit code or error condition accordingly 
 * Make sure the exit code is still evaluated regardless of shown error messages 

 ## Problem 
 * The problem seems to be related to the compound statement. `salt \* saltutil.sync_grains,saltutil.refresh_grains ,` yields a 0 exit code, `salt \* saltutil.sync_grains` yields 1

Back

Project

General

Profile

QA (public) » openQA Project (public) » openQA Infrastructure (public)

action #137984