Project

General

Profile

action #178078

Updated by mkittler about 2 months ago

## Observation 
 From https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3907598 : https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3907598: 

 ``` 
 petrol.qe.nue2.suse.org: 
     The minion function caused an exception: Traceback (most recent call last): 
       File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1912, in _thread_return 
         function_name, function_args, executors, opts, data 
       File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1870, in _execute_job_function 
         return_data = self.executors[fname](opts, data, func, args, kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__ 
         return self.loader.run(run_func, *args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run 
         return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run 
         return callable(*args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as 
         ret = _func_or_method(*args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute 
         return func(*args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__ 
         return self.loader.run(run_func, *args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run 
         return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run 
         return callable(*args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as 
         ret = _func_or_method(*args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/modules/state.py", line 1161, in highstate 
         initial_pillar=_get_initial_pillar(opts), 
       File "/usr/lib/python3.6/site-packages/salt/state.py", line 4953, in __init__ 
         initial_pillar=initial_pillar, 
       File "/usr/lib/python3.6/site-packages/salt/state.py", line 774, in __init__ 
         self.opts["pillar"] = self._gather_pillar() 
       File "/usr/lib/python3.6/site-packages/salt/state.py", line 877, in _gather_pillar 
         return pillar.compile_pillar() 
       File "/usr/lib/python3.6/site-packages/salt/pillar/__init__.py", line 360, in compile_pillar 
         dictkey="pillar", 
       File "/usr/lib/python3.6/site-packages/salt/utils/asynchronous.py", line 112, in wrap 
         lambda: getattr(self.obj, key)(*args, **kwargs) 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync 
         return future_cell[0].result() 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result 
         raise_exc_info(self._exc_info) 
       File "<string>", line 4, in raise_exc_info 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run 
         yielded = self.gen.throw(*exc_info) 
       File "/usr/lib/python3.6/site-packages/salt/channel/client.py", line 172, in crypted_transfer_decode_dictentry 
         timeout=timeout, 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run 
         value = future.result() 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result 
         raise_exc_info(self._exc_info) 
       File "<string>", line 4, in raise_exc_info 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run 
         yielded = self.gen.throw(*exc_info) 
       File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 920, in send 
         ret = yield self.message_client.send(load, timeout=timeout) 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run 
         value = future.result() 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result 
         raise_exc_info(self._exc_info) 
       File "<string>", line 4, in raise_exc_info 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run 
         yielded = self.gen.throw(*exc_info) 
       File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 630, in send 
         recv = yield future 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run 
         value = future.result() 
       File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result 
         raise_exc_info(self._exc_info) 
       File "<string>", line 4, in raise_exc_info 
     salt.exceptions.SaltReqTimeoutError: Message timed out 
 ``` 

 We now since recently call salt with an explicit timeout value with `salt -t 3600` (60 minutes). Most salt nodes return after around 40s but from petrol we get a SaltReqTimeoutError. The complete pipeline ran for (only) 3m 54s. So after which time did the timeout actually happen and why? 

 ## Acceptance Criteria 
 * **AC1:** salt states on petrol don't time out 

 ## Suggestions 
 * This seems to be a generic symptom of a timeout e.g. [many things could file like so](https://github.com/saltstack/salt/issues?q=is%3Aissue%20state%3Aopen%20SaltReqTimeoutError) 
 * There's no specific module visible in tracebacks? 
 * Come up with a reproducer? 
     * Happens very often but only petrol because the machine is slow? 
     * Maybe it's the module timeout? 
 * Enable debugging that would help narrow down where this originates 
 * Can we split up the state execution e.g. apply networking, packages separately?

Back