action #178078
Updated by mkittler about 2 months ago
## Observation From https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3907598 : https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3907598: ``` petrol.qe.nue2.suse.org: The minion function caused an exception: Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1912, in _thread_return function_name, function_args, executors, opts, data File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1870, in _execute_job_function return_data = self.executors[fname](opts, data, func, args, kwargs) File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run return callable(*args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as ret = _func_or_method(*args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute return func(*args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__ return self.loader.run(run_func, *args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs) File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run return callable(*args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as ret = _func_or_method(*args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/modules/state.py", line 1161, in highstate initial_pillar=_get_initial_pillar(opts), File "/usr/lib/python3.6/site-packages/salt/state.py", line 4953, in __init__ initial_pillar=initial_pillar, File "/usr/lib/python3.6/site-packages/salt/state.py", line 774, in __init__ self.opts["pillar"] = self._gather_pillar() File "/usr/lib/python3.6/site-packages/salt/state.py", line 877, in _gather_pillar return pillar.compile_pillar() File "/usr/lib/python3.6/site-packages/salt/pillar/__init__.py", line 360, in compile_pillar dictkey="pillar", File "/usr/lib/python3.6/site-packages/salt/utils/asynchronous.py", line 112, in wrap lambda: getattr(self.obj, key)(*args, **kwargs) File "/usr/lib/python3.6/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync return future_cell[0].result() File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.6/site-packages/salt/channel/client.py", line 172, in crypted_transfer_decode_dictentry timeout=timeout, File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 920, in send ret = yield self.message_client.send(load, timeout=timeout) File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run yielded = self.gen.throw(*exc_info) File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 630, in send recv = yield future File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run value = future.result() File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result raise_exc_info(self._exc_info) File "<string>", line 4, in raise_exc_info salt.exceptions.SaltReqTimeoutError: Message timed out ``` We now since recently call salt with an explicit timeout value with `salt -t 3600` (60 minutes). Most salt nodes return after around 40s but from petrol we get a SaltReqTimeoutError. The complete pipeline ran for (only) 3m 54s. So after which time did the timeout actually happen and why? ## Acceptance Criteria * **AC1:** salt states on petrol don't time out ## Suggestions * This seems to be a generic symptom of a timeout e.g. [many things could file like so](https://github.com/saltstack/salt/issues?q=is%3Aissue%20state%3Aopen%20SaltReqTimeoutError) * There's no specific module visible in tracebacks? * Come up with a reproducer? * Happens very often but only petrol because the machine is slow? * Maybe it's the module timeout? * Enable debugging that would help narrow down where this originates * Can we split up the state execution e.g. apply networking, packages separately?