Project

General

Profile

Actions

action #178078

closed

openQA Infrastructure (public) - coordination #161414: [epic] Improved salt based infrastructure management

salt pipeline deployment fails on master, SaltReqTimeoutError: Message timed out for petrol size:S

Added by okurz about 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
High
Assignee:
Category:
Regressions/Crashes
Target version:
Start date:
2025-01-24
Due date:
% Done:

0%

Estimated time:

Description

Observation

From https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3907598 :

petrol.qe.nue2.suse.org:
    The minion function caused an exception: Traceback (most recent call last):
      File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1912, in _thread_return
        function_name, function_args, executors, opts, data
      File "/usr/lib/python3.6/site-packages/salt/minion.py", line 1870, in _execute_job_function
        return_data = self.executors[fname](opts, data, func, args, kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
        return callable(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
        ret = _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/executors/direct_call.py", line 10, in execute
        return func(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
        return self.loader.run(run_func, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
        return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
      File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
        return callable(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
        ret = _func_or_method(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/modules/state.py", line 1161, in highstate
        initial_pillar=_get_initial_pillar(opts),
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 4953, in __init__
        initial_pillar=initial_pillar,
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 774, in __init__
        self.opts["pillar"] = self._gather_pillar()
      File "/usr/lib/python3.6/site-packages/salt/state.py", line 877, in _gather_pillar
        return pillar.compile_pillar()
      File "/usr/lib/python3.6/site-packages/salt/pillar/__init__.py", line 360, in compile_pillar
        dictkey="pillar",
      File "/usr/lib/python3.6/site-packages/salt/utils/asynchronous.py", line 112, in wrap
        lambda: getattr(self.obj, key)(*args, **kwargs)
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/ioloop.py", line 459, in run_sync
        return future_cell[0].result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/channel/client.py", line 172, in crypted_transfer_decode_dictentry
        timeout=timeout,
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 920, in send
        ret = yield self.message_client.send(load, timeout=timeout)
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1064, in run
        yielded = self.gen.throw(*exc_info)
      File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 630, in send
        recv = yield future
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/gen.py", line 1056, in run
        value = future.result()
      File "/usr/lib/python3.6/site-packages/salt/ext/tornado/concurrent.py", line 249, in result
        raise_exc_info(self._exc_info)
      File "<string>", line 4, in raise_exc_info
    salt.exceptions.SaltReqTimeoutError: Message timed out

We now since recently call salt with an explicit timeout value with salt -t 3600 (60 minutes). Most salt nodes return after around 40s but from petrol we get a SaltReqTimeoutError. The complete pipeline ran for (only) 3m 54s. So after which time did the timeout actually happen and why?

Acceptance Criteria

  • AC1: salt states on petrol don't time out

Suggestions

  • This seems to be a generic symptom of a timeout e.g. many things could file like so
  • There's no specific module visible in tracebacks?
  • Come up with a reproducer?
    • Happens very often but only petrol because the machine is slow?
    • Maybe it's the module timeout?
  • Enable debugging that would help narrow down where this originates
  • Can we split up the state execution e.g. apply networking, packages separately?

Related issues 1 (0 open1 closed)

Copied from openQA Project (public) - action #176121: salt-states-openqa pipeline deploy fails on master, SaltReqTimeoutError: Message timed outResolvedokurz2025-01-24

Actions
Actions #1

Updated by okurz about 2 months ago

  • Copied from action #176121: salt-states-openqa pipeline deploy fails on master, SaltReqTimeoutError: Message timed out added
Actions #2

Updated by mkittler about 2 months ago

  • Description updated (diff)
  • Status changed from New to In Progress
  • Assignee set to mkittler
Actions #3

Updated by mkittler about 2 months ago

Actions #4

Updated by tinita about 2 months ago

  • Subject changed from salt pipeline deploy fails on master, SaltReqTimeoutError: Message timed out for petrol to salt pipeline deployment fails on master, SaltReqTimeoutError: Message timed out for petrol size:S
  • Description updated (diff)
Actions #5

Updated by mkittler about 2 months ago · Edited

A few random notes:


We are on Salt 3006.0 (from Apr 19, 2023). The current release is 3006.9 (from Jul 30, 2024). So we are not that much behind. Maybe an update will help but considering the other findings below I don't think this is likely.


Judging by this comment it might help to bump gather_job_timeout - although the issue in that ticket looks a bit different. Not sure whether it makes sense to test that considering the problem didn't persist.


Relevant log message on the Minion on that day:

Feb 28 10:03:53 petrol salt-minion[127222]: [WARNING ] The minion function caused an exception
…
Feb 28 20:26:24 petrol salt-minion[127222]: [ERROR   ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Feb 28 20:26:29 petrol salt-minion[127222]: [ERROR   ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Feb 28 20:26:34 petrol salt-minion[127222]: [ERROR   ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Feb 28 20:26:34 petrol salt-minion[127222]: [WARNING ] Unable to send mine data to master.

So it was probably about sending mine data to the master.


It looks like this problem did not persist. In fact, the one failing deployment job mentioned in the ticket description is still the last one.


To answer the question from the ticket description:

So after which time did the timeout actually happen and why?

After not more than 3 minutes and 54 seconds because then the CI job terminated. This is most likely a timeout on Salt's internal transport mechanism (using ZeroMQ in this case). I couldn't find documentation on what the default of this timeout is and how one could configure it. This "transport timeout" definitely doesn't correspond to the --timeout CLI option. Maybe it corresponds to one of the timeouts mentioned on SUMA documentation. (The official documentation also mentions those timeouts.)


Looks like the timeout is passed here:

      File "/usr/lib/python3.6/site-packages/salt/transport/zeromq.py", line 920, in send
        ret = yield self.message_client.send(load, timeout=timeout)
917     @salt.ext.tornado.gen.coroutine
918     def send(self, load, timeout=60):
919         self.connect()
920         ret = yield self.message_client.send(load, timeout=timeout)
921         raise salt.ext.tornado.gen.Return(ret)

So the timeout would be 60 seconds unless overridden. I'm not sure whether it is overridden here because the real caller of the function is not clear. It looks like in general the timeout is specified depending on the actual command being sent and often left to 60 seconds. Judging by the code I don't think this kind of timeout is something we can specify somewhere in a config file.

Considering the CI job only ran < 4 minutes in total we know that whatever this specific call used as timeout was less than that, probably just the default of 60 seconds or even just 5 seconds considering the log message above. It would perhaps make sense to increase this timeout for workers like petrol that are known to be slow but I wouldn't know how without changing the code (and even that seems tricky without getting involved into the codebase). Maybe petrol is also not at all at fault here and it was really the master being too slow at the time.


The worker wasn't actually super busy when the deployment was running: https://stats.openqa-monitor.qa.suse.de/d/WDpetrol/worker-dashboard-petrol?orgId=1&from=2025-02-27T23%3A24%3A12.663Z&to=2025-02-28T23%3A08%3A54.787Z&timezone=browser&var-datasource=000000001

Actions #6

Updated by mkittler about 2 months ago

  • Description updated (diff)
Actions #7

Updated by mkittler about 2 months ago

  • Status changed from In Progress to Feedback

I gathered quite some information but don't know what we can/should do to improve this.

Actions #8

Updated by mkittler about 2 months ago

I found https://github.com/saltstack/salt/issues/62881 which is closed but it is unclear whether the problem was actually resolved. If it was resolved then updating Salt might help.

Since this is probably about updating the Salt mine it could be that triggering the update in https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/140fe93417dd1022435654c904da0b881d868654/deploy.yml#L66 makes things worse.

I could try to shutdown Wireguard while applying the highstate manually and see how it behaves.

Actions #9

Updated by mkittler about 2 months ago

  • Status changed from Feedback to In Progress

We saw the problem again today:

Mar 05 09:22:59 petrol salt-minion[127222]: [ERROR   ] A command in 'cmd.run' had a problem: Specified cwd 'is' either not absolute or does not exist
Mar 05 09:25:57 petrol salt-minion[127222]: [ERROR   ] Command 'journalctl' failed with return code: 1
Mar 05 09:25:57 petrol salt-minion[127222]: [ERROR   ] stdout: 0
Mar 05 09:25:57 petrol salt-minion[127222]: [ERROR   ] retcode: 1
Mar 05 09:25:57 petrol salt-minion[127222]: [ERROR   ] Command 'journalctl' failed with return code: 1
Mar 05 09:25:57 petrol salt-minion[127222]: [ERROR   ] output: 0
Mar 05 09:28:16 petrol salt-minion[127222]: [ERROR   ] Command 'journalctl' failed with return code: 1
Mar 05 09:28:16 petrol salt-minion[127222]: [ERROR   ] stdout: 0
Mar 05 09:28:16 petrol salt-minion[127222]: [ERROR   ] retcode: 1
Mar 05 09:28:16 petrol salt-minion[127222]: [ERROR   ] Command 'journalctl' failed with return code: 1
Mar 05 09:28:16 petrol salt-minion[127222]: [ERROR   ] output: 0
Mar 05 09:28:41 petrol salt-minion[127222]: [ERROR   ] Command 'journalctl' failed with return code: 1
Mar 05 09:28:41 petrol salt-minion[127222]: [ERROR   ] retcode: 1
Mar 05 09:28:41 petrol salt-minion[127222]: [ERROR   ] Command 'journalctl' failed with return code: 1
Mar 05 09:28:41 petrol salt-minion[127222]: [ERROR   ] output:
Mar 05 09:46:39 petrol salt-minion[127222]: [ERROR   ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Mar 05 09:46:44 petrol salt-minion[127222]: [ERROR   ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Mar 05 09:46:49 petrol salt-minion[127222]: [ERROR   ] Failed to send msg SaltReqTimeoutError('Message timed out',)
Mar 05 09:46:49 petrol salt-minion[127222]: [WARNING ] Unable to send mine data to master.
Mar 05 10:03:22 petrol useradd[58938]: new user: name=fszekely, UID=1107, GID=100, home=/home/fszekely, shell=/bin/bash, from=none

I haven't seen a corresponding failing CI job on the states, pillars and osd-deployment repo. Maybe this happened in another context. Maybe these error messages are actually independent from the SaltReqTimeoutError that caused the pipeline failure.

Actions #10

Updated by mkittler about 2 months ago · Edited

If I break the Wireguard tunnel while applying the state is ongoing I just get:

martchus@openqa:~> sudo salt 'petrol.qe.nue2.suse.org' state.apply
petrol.qe.nue2.suse.org:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20250305141406188847

This is the same message one gets when the Wireguard tunnel is not available from the start. I didn't get any errors in the local journal as well.

When invoking

martchus@petrol:~> sudo salt-call --local mine.update
local:
    None

or

martchus@petrol:~> sudo salt-call mine.update
local:
    True

on Petrol while the Wireguard tunnel is offline I also don't get any errors.


I'm now running 100 updates in a loop to see how high the error rate is. So far I was only able to provoke an expat error we most likely already know:

Mar 05 15:05:42 petrol salt-minion[19067]: [ERROR   ] Unable to manage file: name '__env__' is not defined
Mar 05 15:05:50 petrol salt-minion[19067]: [ERROR   ] An exception occurred in this state: Traceback (most recent call last):
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/state.py", line 2402, in call
Mar 05 15:05:50 petrol salt-minion[19067]:     *cdata["args"], **cdata["kwargs"]
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 05 15:05:50 petrol salt-minion[19067]:     return self.loader.run(run_func, *args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 05 15:05:50 petrol salt-minion[19067]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 05 15:05:50 petrol salt-minion[19067]:     return callable(*args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 05 15:05:50 petrol salt-minion[19067]:     ret = _func_or_method(*args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1285, in wrapper
Mar 05 15:05:50 petrol salt-minion[19067]:     return f(*args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/states/pkg.py", line 2659, in latest
Mar 05 15:05:50 petrol salt-minion[19067]:     *desired_pkgs, fromrepo=fromrepo, refresh=refresh, **kwargs
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 05 15:05:50 petrol salt-minion[19067]:     return self.loader.run(run_func, *args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 05 15:05:50 petrol salt-minion[19067]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 05 15:05:50 petrol salt-minion[19067]:     return callable(*args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 05 15:05:50 petrol salt-minion[19067]:     ret = _func_or_method(*args, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 828, in latest_version
Mar 05 15:05:50 petrol salt-minion[19067]:     package_info = info_available(*names, **kwargs)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 752, in info_available
Mar 05 15:05:50 petrol salt-minion[19067]:     "info", "-t", "package", *batch[:batch_size]
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 439, in __call
Mar 05 15:05:50 petrol salt-minion[19067]:     salt.utils.stringutils.to_str(self.__call_result["stdout"])
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib64/python3.6/xml/dom/minidom.py", line 1968, in parseString
Mar 05 15:05:50 petrol salt-minion[19067]:     return expatbuilder.parseString(string)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib64/python3.6/xml/dom/expatbuilder.py", line 925, in parseString
Mar 05 15:05:50 petrol salt-minion[19067]:     return builder.parseString(string)
Mar 05 15:05:50 petrol salt-minion[19067]:   File "/usr/lib64/python3.6/xml/dom/expatbuilder.py", line 223, in parseString
Mar 05 15:05:50 petrol salt-minion[19067]:     parser.Parse(string, True)
Mar 05 15:05:50 petrol salt-minion[19067]: xml.parsers.expat.ExpatError: syntax error: line 1, column 0

This traceback didn't lead to a failure visible on the master-side.


The exapt error is actually showing up quite a lot:

Mar 05 16:05:44 petrol salt-minion[19067]: AttributeError: 'str' object has no attribute 'getElementsByTagName'
Mar 05 16:05:50 petrol salt-minion[19067]: [ERROR   ] An exception occurred in this state: Traceback (most recent call last):
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/state.py", line 2402, in call
Mar 05 16:05:50 petrol salt-minion[19067]:     *cdata["args"], **cdata["kwargs"]
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 05 16:05:50 petrol salt-minion[19067]:     return self.loader.run(run_func, *args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 05 16:05:50 petrol salt-minion[19067]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 05 16:05:50 petrol salt-minion[19067]:     return callable(*args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 05 16:05:50 petrol salt-minion[19067]:     ret = _func_or_method(*args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1285, in wrapper
Mar 05 16:05:50 petrol salt-minion[19067]:     return f(*args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/states/pkg.py", line 2659, in latest
Mar 05 16:05:50 petrol salt-minion[19067]:     *desired_pkgs, fromrepo=fromrepo, refresh=refresh, **kwargs
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 05 16:05:50 petrol salt-minion[19067]:     return self.loader.run(run_func, *args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 05 16:05:50 petrol salt-minion[19067]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 05 16:05:50 petrol salt-minion[19067]:     return callable(*args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 05 16:05:50 petrol salt-minion[19067]:     ret = _func_or_method(*args, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 828, in latest_version
Mar 05 16:05:50 petrol salt-minion[19067]:     package_info = info_available(*names, **kwargs)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 752, in info_available
Mar 05 16:05:50 petrol salt-minion[19067]:     "info", "-t", "package", *batch[:batch_size]
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 439, in __call
Mar 05 16:05:50 petrol salt-minion[19067]:     salt.utils.stringutils.to_str(self.__call_result["stdout"])
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib64/python3.6/xml/dom/minidom.py", line 1968, in parseString
Mar 05 16:05:50 petrol salt-minion[19067]:     return expatbuilder.parseString(string)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib64/python3.6/xml/dom/expatbuilder.py", line 925, in parseString
Mar 05 16:05:50 petrol salt-minion[19067]:     return builder.parseString(string)
Mar 05 16:05:50 petrol salt-minion[19067]:   File "/usr/lib64/python3.6/xml/dom/expatbuilder.py", line 223, in parseString
Mar 05 16:05:50 petrol salt-minion[19067]:     parser.Parse(string, True)
Mar 05 16:05:50 petrol salt-minion[19067]: xml.parsers.expat.ExpatError: syntax error: line 1, column 0
Mar 05 16:06:21 petrol salt-minion[19067]: [ERROR   ] An exception occurred in this state: Traceback (most recent call last):
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/state.py", line 2402, in call
Mar 05 16:06:21 petrol salt-minion[19067]:     *cdata["args"], **cdata["kwargs"]
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 05 16:06:21 petrol salt-minion[19067]:     return self.loader.run(run_func, *args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 05 16:06:21 petrol salt-minion[19067]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 05 16:06:21 petrol salt-minion[19067]:     return callable(*args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 05 16:06:21 petrol salt-minion[19067]:     ret = _func_or_method(*args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1285, in wrapper
Mar 05 16:06:21 petrol salt-minion[19067]:     return f(*args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/states/pkg.py", line 1686, in installed
Mar 05 16:06:21 petrol salt-minion[19067]:     pkgs, refresh = _resolve_capabilities(pkgs, refresh=refresh, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/states/pkg.py", line 990, in _resolve_capabilities
Mar 05 16:06:21 petrol salt-minion[19067]:     ret = __salt__["pkg.resolve_capabilities"](pkgs, refresh=refresh, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 05 16:06:21 petrol salt-minion[19067]:     return self.loader.run(run_func, *args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 05 16:06:21 petrol salt-minion[19067]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 05 16:06:21 petrol salt-minion[19067]:     return callable(*args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 05 16:06:21 petrol salt-minion[19067]:     ret = _func_or_method(*args, **kwargs)
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 3280, in resolve_capabilities
Mar 05 16:06:21 petrol salt-minion[19067]:     search(name, root=root, match="exact")
Mar 05 16:06:21 petrol salt-minion[19067]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 2887, in search
Mar 05 16:06:21 petrol salt-minion[19067]:     .nolock.noraise.xml.call(*cmd)
Mar 05 16:06:21 petrol salt-minion[19067]: AttributeError: 'str' object has no attribute 'getElementsByTagName'

I also once ran into a "no response" error (even though petrol and osd seemed both online all the time, maybe the Wireguard tunnel is unstable):

run 086
petrol.qe.nue2.suse.org:
    Minion did not return. [No response]
    The minions may not have all finished running and any remaining minions will return upon completion. To look up the return data for this job later, run the following command:

    salt-run jobs.lookup_jid 20250305160505018753

I haven't encountered any failure after 50 runs. It is probably best to just add another retry: https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1395

Actions #11

Updated by mkittler about 2 months ago

  • Status changed from In Progress to Resolved

Looks like the deployment still works after adding retry (https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3936427/raw). I don't know how else to improve this without digging too deep into Salt upstream code so I'm considering this ticket resolved.

Actions

Also available in: Atom PDF