Project

General

Custom queries

Profile

Actions

action #176949

open

coordination #161414: [epic] Improved salt based infrastructure management

[timeboxed:10h][research] understand/prevent sporadic zypper related stack traces in salt size:S

Added by okurz about 2 months ago. Updated 16 days ago.

Status:
Workable
Priority:
Low
Assignee:
-
Category:
Regressions/Crashes
Target version:
Start date:
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Sometimes our salt pipelines fail with errors like in #155182, e.g.

----------
    Function: pkg.latest
        Name: velociraptor-client
      Result: False
     Comment: An exception occurred in this state: Traceback (most recent call last):
                File "/usr/lib/python3.6/site-packages/salt/state.py", line 2402, in call
                  *cdata["args"], **cdata["kwargs"]
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
                  ret = _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1285, in wrapper
                  return f(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/states/pkg.py", line 2659, in latest
                  *desired_pkgs, fromrepo=fromrepo, refresh=refresh, **kwargs
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
                  return self.loader.run(run_func, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
                  return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
                File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
                  return callable(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
                  ret = _func_or_method(*args, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 828, in latest_version
                  package_info = info_available(*names, **kwargs)
                File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 752, in info_available
                  "info", "-t", "package", *batch[:batch_size]
                File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 439, in __call
                  salt.utils.stringutils.to_str(self.__call_result["stdout"])
                File "/usr/lib64/python3.6/xml/dom/minidom.py", line 1968, in parseString
                  return expatbuilder.parseString(string)
                File "/usr/lib64/python3.6/xml/dom/expatbuilder.py", line 925, in parseString
                  return builder.parseString(string)
                File "/usr/lib64/python3.6/xml/dom/expatbuilder.py", line 223, in parseString
                  parser.Parse(string, True)
              xml.parsers.expat.ExpatError: syntax error: line 1, column 0
     Started: 14:56:34.269887
    Duration: 1873.979 ms
     Changes:   

also see https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/3726297
The issue was happening more often with more "unstable" repositories (e.g. the security-sensor repo) but still happens sometimes e.g. on our monitor host:
https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/3764576#L853

Goals

  • G1: Prevent zypper+salt failures when interacting with development repositories

Suggestions

  • zypper --no-refresh info -t package grafana
  • This occurs with multiple packages including velociraptor and grafana
  • The error seems to suggest that the pkg.latest-state tries to parse some XML from zypper but already fails very early ("line 1, column 0"). From the stacktrace you can see zypper info -t package, maybe this can be used to replicate our problems?
    • Research if there is another way to specify repos/packages in salt to avoid this symptom
    • Lookup known issues
    • Find or file a zypper issue upstream
    • Consider changing zypper settings, e.g. disable auto-refresh
    • Find or file an OBS issue upstream
  • Is maybe https://github.com/saltstack/salt/issues/46954 related?
  • Come up with a reproducer that doesn't depend on a salt pipeline run
    • For example calling zypper info -t package grafana as visible in ther stacktrace
  • Make the error message visible i.e. if the "syntax error" is actually an error fetching the data

Related issues 1 (1 open0 closed)

Copied from openQA Infrastructure (public) - action #176325: sporadic: zypper related stack traces in salt pipelineNew

Actions
Actions #1

Updated by okurz about 2 months ago

  • Copied from action #176325: sporadic: zypper related stack traces in salt pipeline added
Actions #2

Updated by okurz about 2 months ago

  • Priority changed from Normal to Low

Another CI job failed due to missing retry on grafana package install. Fixed in https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1378 . Also be aware that I removed the custom repo for security-sensor as part of #159060 so the problem becomes more theoretical

Actions #3

Updated by okurz about 1 month ago

  • Parent task changed from #155182 to #161414
Actions #4

Updated by okurz 23 days ago

  • Target version changed from Ready to future

No recent interest by the team so moving this out of the backlog due to other priorities

Actions #5

Updated by nicksinger 16 days ago

Recent issue on monitor:

Mar 19 12:40:41 monitor salt-minion[17632]: [ERROR   ] An exception occurred in this state: Traceback (most recent call last):
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/state.py", line 2402, in call
Mar 19 12:40:41 monitor salt-minion[17632]:     *cdata["args"], **cdata["kwargs"]
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 19 12:40:41 monitor salt-minion[17632]:     return self.loader.run(run_func, *args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 19 12:40:41 monitor salt-minion[17632]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 19 12:40:41 monitor salt-minion[17632]:     return callable(*args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 19 12:40:41 monitor salt-minion[17632]:     ret = _func_or_method(*args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1285, in wrapper
Mar 19 12:40:41 monitor salt-minion[17632]:     return f(*args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/states/pkg.py", line 1686, in installed
Mar 19 12:40:41 monitor salt-minion[17632]:     pkgs, refresh = _resolve_capabilities(pkgs, refresh=refresh, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/states/pkg.py", line 990, in _resolve_capabilities
Mar 19 12:40:41 monitor salt-minion[17632]:     ret = __salt__["pkg.resolve_capabilities"](pkgs, refresh=refresh, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 149, in __call__
Mar 19 12:40:41 monitor salt-minion[17632]:     return self.loader.run(run_func, *args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1234, in run
Mar 19 12:40:41 monitor salt-minion[17632]:     return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/contextvars/__init__.py", line 38, in run
Mar 19 12:40:41 monitor salt-minion[17632]:     return callable(*args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/loader/lazy.py", line 1249, in _run_as
Mar 19 12:40:41 monitor salt-minion[17632]:     ret = _func_or_method(*args, **kwargs)
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 3280, in resolve_capabilities
Mar 19 12:40:41 monitor salt-minion[17632]:     search(name, root=root, match="exact")
Mar 19 12:40:41 monitor salt-minion[17632]:   File "/usr/lib/python3.6/site-packages/salt/modules/zypperpkg.py", line 2887, in search
Mar 19 12:40:41 monitor salt-minion[17632]:     .nolock.noraise.xml.call(*cmd)
Mar 19 12:40:41 monitor salt-minion[17632]: AttributeError: 'str' object has no attribute 'getElementsByTagName'

the issue happens at a different place (apparently at a "zypper search"-call) but is again parsing related. I had a look at https://gitlab.cc-asp.fraunhofer.de/janniswarnat/salt/-/blob/v3002.8/salt/modules/zypperpkg.py?ref_type=tags#L2491-2496 (I didn't find this file on github, is this a downstream module developed by SUSE?). According to the stacktrace the zypper-call should output XML but the parser returns a str-object (I would assume because it fails to parse).
https://gitlab.cc-asp.fraunhofer.de/janniswarnat/salt/-/blob/v3002.8/salt/modules/zypperpkg.py?ref_type=tags#L297 describes a little better how zypper is called - I think we're mainly interested in the output of e.g. zypper --xmlout search grafana any why pythons minidom fails to parse it.

Actions

Also available in: Atom PDF