Project

General

Profile

Actions

action #178078

closed

openQA Infrastructure (public) - coordination #161414: [epic] Improved salt based infrastructure management

salt pipeline deployment fails on master, SaltReqTimeoutError: Message timed out for petrol size:S

Added by okurz about 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-01-24
Due date:
% Done:

0%

Estimated time:

Description

Observation

From https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3907598 :

petrol.qe.nue2.suse.org:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1912, in _thread_return
        function_name, function_args, executors, opts, data
      File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1870, in _execute_job_function
        return_data = self.executors[fname](opts, data, func, args, kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
        return callable(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
        ret = _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
        return func(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
        return callable(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
        ret = _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/modules/state.py", line 1161, in highstate
        initial_pillar=_get_initial_pillar(opts),
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 4953, in __init__
        initial_pillar=initial_pillar,
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 774, in __init__
        self.opts["pillar"] = self._gather_pillar()
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 877, in _gather_pillar
        return pillar.compile_pillar()
      File "/usr/lib/python3.6/site-packages/salt/pillar/__init__.py", line 360, in compile_pillar
        dictkey="pillar",
      File "/usr/lib/python3.6/site-packages/salt/utils/asynchronous.py", line 112, in wrap
        lambda: getattr(self.obj, key)(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync
        return future_cell[0].result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/channel/client.py", line 172, in crypted_transfer_decode_dictentry
        timeout=timeout,
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 920, in send
        ret = yield self.message_client.send(load, timeout=timeout)
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 630, in send
        recv = yield future
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
    salt.exceptions.SaltReqTimeoutError: Message timed out

We now since recently call salt with an explicit timeout value with salt -t 3600 (60 minutes). Most salt nodes return after around 40s but from petrol we get a SaltReqTimeoutError. The complete pipeline ran for (only) 3m 54s. So after which time did the timeout actually happen and why?

Acceptance Criteria

  • AC1: salt states on petrol don't time out

Suggestions

  • This seems to be a generic symptom of a timeout e.g. many things could file like so
  • There's no specific module visible in tracebacks?
  • Come up with a reproducer?
    • Happens very often but only petrol because the machine is slow?
    • Maybe it's the module timeout?
  • Enable debugging that would help narrow down where this originates
  • Can we split up the state execution e.g. apply networking, packages separately?

Related issues 1 (0 open1 closed)

Copied from openQA Project (public) - action #176121: salt-states-openqa pipeline deploy fails on master, SaltReqTimeoutError: Message timed outResolvedokurz2025-01-24

Actions
Actions

Also available in: Atom PDF