Project

General

Profile

Actions

action #176121

closed

openQA Infrastructure (public) - coordination #161414: [epic] Improved salt based infrastructure management

salt-states-openqa pipeline deploy fails on master, SaltReqTimeoutError: Message timed out

Added by robert.richardson about 1 month ago. Updated about 1 month ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-01-24
Due date:
% Done:

0%

Estimated time:

Description

deploy stage failing since 2025-01-23 at 21:58 (triggered by https://gitlab.suse.de/openqa/salt-states-openqa/-/commit/ab6442055afd6a6b8b59580c168992e1161573fa)

first run failed with xml.parsers.expat.ExpatError: no element found: line 1, column 0:

          ID: python3-augeas
    Function: pkg.installed
      Result: False
     Comment: Attempt 1: Returned a result of "False", with the following comment: "An exception occurred in this state: Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/state.py", line 2402, in call
                  *cdata["args"], **cdata["kwargs"]
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)```
                ...
                File "/usr/lib64/python3.6/xml/dom/expatbuilder.py", line 223, in parseString
                  parser.Parse(string, True)
              xml.parsers.expat.ExpatError: no element found: line 1, column 0

rerun failed with salt.exceptions.SaltReqTimeoutError: Message timed out:

monitor.qe.nue2.suse.org:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1912, in _thread_return
        function_name, function_args, executors, opts, data
      File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1870, in _execute_job_function
        return_data = self.executors[fname](opts, data, func, args, kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
        return callable(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
        ret = _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
        return func(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
        return callable(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
        ret = _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/modules/state.py", line 1161, in highstate
        initial_pillar=_get_initial_pillar(opts),
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 4953, in __init__
        initial_pillar=initial_pillar,
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 774, in __init__
        self.opts["pillar"] = self._gather_pillar()
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 877, in _gather_pillar
        return pillar.compile_pillar()
      File "/usr/lib/python3.6/site-packages/salt/pillar/__init__.py", line 360, in compile_pillar
        dictkey="pillar",
      File "/usr/lib/python3.6/site-packages/salt/utils/asynchronous.py", line 112, in wrap
        lambda: getattr(self.obj, key)(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync
        return future_cell[0].result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/channel/client.py", line 172, in crypted_transfer_decode_dictentry
        timeout=timeout,
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 920, in send
        ret = yield self.message_client.send(load, timeout=timeout)
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 630, in send
        recv = yield future
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
    salt.exceptions.SaltReqTimeoutError: Message timed out

Workaround

  • Retrigger

Related issues 3 (1 open2 closed)

Related to openQA Infrastructure (public) - action #175695: salt states sporadically fail in applying the security sensor repo with "xml.parsers.expat.ExpatError: syntax error: line 1, column 0", didn't we remove the security sensor repo?Resolvedokurz2025-01-17

Actions
Related to openQA Infrastructure (public) - action #175989: Too big logfiles causing failed systemd services alert: logrotate (monitor, openqaw5-xen, s390zl12) size:SResolvedjbaier_cz2025-01-22

Actions
Copied to openQA Project (public) - action #178078: salt pipeline deploy fails on master, SaltReqTimeoutError: Message timed out for petrolNew2025-01-24

Actions
Actions #1

Updated by robert.richardson about 1 month ago

  • Tags changed from alert, reactive work to alert, reactive work, infra
  • Priority changed from Normal to Urgent
  • Target version set to Ready
Actions #2

Updated by robert.richardson about 1 month ago

  • Related to action #175695: salt states sporadically fail in applying the security sensor repo with "xml.parsers.expat.ExpatError: syntax error: line 1, column 0", didn't we remove the security sensor repo? added
Actions #4

Updated by okurz about 1 month ago ยท Edited

  • Description updated (diff)
  • Assignee set to okurz
  • Priority changed from Urgent to High

I found one upstream issue https://github.com/saltstack/salt/issues/53147 "Salt Tornado API: salt.exceptions.SaltReqTimeoutError: Message timed out". A newer python and salt version might be helpful but of course not easy to achieve an Leap. I will look into the issue and consider retrying on the according stage. Apparently robert.richardson already retriggered again as the third run of deploy succeeded in https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3705753

Actions #5

Updated by okurz about 1 month ago

  • Parent task set to #161414
Actions #6

Updated by okurz about 1 month ago

  • Due date set to 2025-02-11
  • Status changed from New to Feedback
Actions #7

Updated by okurz about 1 month ago

  • Due date deleted (2025-02-11)
  • Status changed from Feedback to Resolved

merged and deployed. Looks good for now. We will have to see from production if more related problems come up.

Actions #8

Updated by jbaier_cz 27 days ago

  • Related to action #175989: Too big logfiles causing failed systemd services alert: logrotate (monitor, openqaw5-xen, s390zl12) size:S added
Actions #9

Updated by okurz 3 days ago

  • Copied to action #178078: salt pipeline deploy fails on master, SaltReqTimeoutError: Message timed out for petrol added
Actions

Also available in: Atom PDF