openSUSE Project Management Tool: Issues
https://progress.opensuse.org/
https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?1582917784
2023-12-21T16:45:31Z
openSUSE Project Management Tool
Redmine
openQA Infrastructure - action #152857 (Resolved): [tools] alert ping between hosts timeout prox...
https://progress.opensuse.org/issues/152857
2023-12-21T16:45:31Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1</a></p>
<p>looks like proxy.scc.de is down ..</p>
<p><a href="https://suse.slack.com/archives/C029APBKLGK/p1703170652751919" class="external">https://suse.slack.com/archives/C029APBKLGK/p1703170652751919</a></p>
<p>Q: who is responsible for proxy.scc.suse.de and where is running ?</p>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Remove silence "alertname=Packet loss between worker hosts and other hosts alert" from <a href="https://monitor.qa.suse.de/alerting/silences" class="external">https://monitor.qa.suse.de/alerting/silences</a></li>
</ul>
openQA Infrastructure - action #152827 (Resolved): [tools] cron service updating clamav database ...
https://progress.opensuse.org/issues/152827
2023-12-21T09:17:42Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From /var/spool/mail/cvdupdate on both instances of the osd and o3 web UI:</p>
<pre><code>From cvdupdate@localhost Thu Dec 21 10:00:01 2023
Return-Path: <cvdupdate@localhost>
X-Original-To: cvdupdate
Delivered-To: cvdupdate@localhost
Received: by localhost (Postfix, from userid 17307)
id BC86134590; Thu, 21 Dec 2023 10:00:01 +0100 (CET)
From: "(Cron Daemon)" <cvdupdate@localhost>
To: cvdupdate@localhost
Subject: Cron <cvdupdate@openqa> /home/cvdupdate/.local/bin/cvdupdate update
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: <XDG_SESSION_ID=c21238>
X-Cron-Env: <XDG_RUNTIME_DIR=/run/user/17307>
X-Cron-Env: <DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/17307/bus>
X-Cron-Env: <XDG_SESSION_TYPE=unspecified>
X-Cron-Env: <XDG_SESSION_CLASS=background>
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/cvdupdate>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=cvdupdate>
X-Cron-Env: <USER=cvdupdate>
Message-Id: <20231221090001.BC86134590@localhost>
Date: Thu, 21 Dec 2023 10:00:01 +0100 (CET)
Traceback (most recent call last):
File "/home/cvdupdate/.local/bin/cvdupdate", line 11, in <module>
sys.exit(cli())
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 259, in update_alias
ctx.forward(db_update)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 628, in forward
return self.invoke(cmd, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 100, in db_update
m = CVDUpdate(config=config, verbose=verbose)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 119, in __init__
nameserver)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 184, in _read_config
self.config = json.load(config_file)
File "/usr/lib64/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Clamav is no longer raising exceptions on o3</li>
<li><strong>AC2:</strong> Clamav is no longer raising exceptions on osd</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Remove clamav service - install on o3, remove from salt on osd</li>
</ul>
openQA Infrastructure - action #152741 (Resolved): [tools] gitlab CI - openqa_review failed with ...
https://progress.opensuse.org/issues/152741
2023-12-18T15:46:57Z
osukup
<p>Looks like osd wasn't able to reply to api longer than 30 sec, can be a random network problem or too complicated query? </p>
<p><a href="https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520" class="external">https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520</a></p>
<pre><code>usr/bin/openqa-review --host https://openqa.suse.de -n -r -T --query-issue-status --no-empty-sections --include-softfails --running-threshold=2 --exclude-job-groups '^(Released|Development|old|EOL)' --reminder-comment-on-issues --save --save-dir /tmp/tmp.1LmmaKoNx7 --job-groups '^SLE.*15.*(Functional)'
..............................WARNING:urllib3.connectionpool:Retrying (Retry(total=6, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..................................WARNING:urllib3.connectionpool:Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
......................................WARNING:urllib3.connectionpool:Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
...................................................................................................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................WARNING:openqa_review.browser:Request to https://openqa.suse.de/api/v1/parent_groups was not successful after 7 retries: HTTPSConnectionPool(host='openqa.suse.de', port=443): Max retries exceeded with url: /api/v1/parent_groups (Caused by ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)"))
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
conn.connect()
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 642, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 782, in _ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 470, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 514, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 1108, in _create
self.do_handshake()
File "/usr/lib64/python3.11/ssl.py", line 1379, in do_handshake
self._sslobj.do_handshake()
TimeoutError: _ssl.c:989: The handshake operation timed out
The above exception was the direct cause of the following exception:
</code></pre>
openQA Infrastructure - action #135335 (Resolved): [tools] gitlabci salt-pillars-openqa deploy f...
https://progress.opensuse.org/issues/135335
2023-09-07T08:04:46Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907</a></p>
<p>from log:</p>
<pre><code> ID: wicked ifup br1
Function: cmd.run
Result: False
Comment: Command "wicked ifup br1" run
Started: 07:07:10.803006
Duration: 30119.464 ms
Changes:
----------
pid:
16955
retcode:
157
stderr:
stdout:
br1 no-device
Name: /etc/sysconfig/network/ifcfg-tap0 - Function: file.managed - Result: Clean Started: - 07:07:40.934647 Duration: 7.021 ms
Name: /etc/sysconfig/network/ifcfg-tap64 - Function: file.managed - Result: Clean Started: - 07:07:40.945434 Duration: 5.152 ms
Name: /etc/sysconfig/network/ifcfg-tap128 - Function: file.managed - Result: Clean Started: - 07:07:40.954239 Duration: 5.0 ms
Name: /etc/sysconfig/network/ifcfg-tap1 - Function: file.managed - Result: Clean Started: - 07:07:40.962885 Duration: 5.013 ms
Name: /etc/sysconfig/network/ifcfg-tap65 - Function: file.managed - Result: Clean Started: - 07:07:40.971581 Duration: 5.002 ms
Name: /etc/sysconfig/network/ifcfg-tap129 - Function: file.managed - Result: Clean Started: - 07:07:40.980234 Duration: 4.984 ms
Name: /etc/sysconfig/network/ifcfg-tap2 - Function: file.managed - Result: Clean Started: - 07:07:40.988915 Duration: 5.037 ms
Name: /etc/sysconfig/network/ifcfg-tap66 - Function: file.managed - Result: Clean Started: - 07:07:40.997624 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap130 - Function: file.managed - Result: Clean Started: - 07:07:41.006362 Duration: 4.946 ms
Name: /etc/sysconfig/network/ifcfg-tap3 - Function: file.managed - Result: Clean Started: - 07:07:41.015198 Duration: 5.266 ms
Name: /etc/sysconfig/network/ifcfg-tap67 - Function: file.managed - Result: Clean Started: - 07:07:41.024299 Duration: 5.178 ms
Name: /etc/sysconfig/network/ifcfg-tap131 - Function: file.managed - Result: Clean Started: - 07:07:41.033154 Duration: 4.992 ms
Name: /etc/sysconfig/network/ifcfg-tap4 - Function: file.managed - Result: Clean Started: - 07:07:41.041806 Duration: 4.955 ms
Name: /etc/sysconfig/network/ifcfg-tap68 - Function: file.managed - Result: Clean Started: - 07:07:41.050532 Duration: 5.287 ms
Name: /etc/sysconfig/network/ifcfg-tap132 - Function: file.managed - Result: Clean Started: - 07:07:41.059443 Duration: 4.926 ms
Name: /etc/sysconfig/network/ifcfg-tap5 - Function: file.managed - Result: Clean Started: - 07:07:41.068081 Duration: 4.993 ms
Name: /etc/sysconfig/network/ifcfg-tap69 - Function: file.managed - Result: Clean Started: - 07:07:41.076758 Duration: 4.93 ms
Name: /etc/sysconfig/network/ifcfg-tap133 - Function: file.managed - Result: Clean Started: - 07:07:41.085353 Duration: 4.942 ms
Name: /etc/sysconfig/network/ifcfg-tap6 - Function: file.managed - Result: Clean Started: - 07:07:41.093943 Duration: 5.056 ms
Name: /etc/sysconfig/network/ifcfg-tap70 - Function: file.managed - Result: Clean Started: - 07:07:41.102645 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap134 - Function: file.managed - Result: Clean Started: - 07:07:41.111287 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap7 - Function: file.managed - Result: Clean Started: - 07:07:41.119942 Duration: 4.913 ms
Name: /etc/sysconfig/network/ifcfg-tap71 - Function: file.managed - Result: Clean Started: - 07:07:41.128614 Duration: 4.959 ms
Name: /etc/sysconfig/network/ifcfg-tap135 - Function: file.managed - Result: Clean Started: - 07:07:41.137410 Duration: 4.953 ms
Name: /etc/sysconfig/network/ifcfg-tap8 - Function: file.managed - Result: Clean Started: - 07:07:41.146176 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap72 - Function: file.managed - Result: Clean Started: - 07:07:41.154807 Duration: 5.035 ms
Name: /etc/sysconfig/network/ifcfg-tap136 - Function: file.managed - Result: Clean Started: - 07:07:41.163660 Duration: 4.937 ms
Name: /etc/sysconfig/network/ifcfg-tap9 - Function: file.managed - Result: Clean Started: - 07:07:41.172266 Duration: 4.954 ms
Name: /etc/sysconfig/network/ifcfg-tap73 - Function: file.managed - Result: Clean Started: - 07:07:41.181001 Duration: 4.95 ms
Name: /etc/sysconfig/network/ifcfg-tap137 - Function: file.managed - Result: Clean Started: - 07:07:41.189605 Duration: 5.503 ms
</code></pre>
<a name="Acceptance-Criteria"></a>
<h2 >Acceptance Criteria<a href="#Acceptance-Criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Salt states apply successfully on imageworker</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate if one of the service defs is missing a "requires" or similar</li>
<li>Commands were re-run - consider persistent mitigations if this is causing other pipelines to fail</li>
<li>This seems to affect imagetester, openqaworker17.qa.suse.cz, openqaworker16.qa.suse.cz and openqaworker18.qa.suse.cz so far</li>
</ul>
openQA Infrastructure - action #135206 (Rejected): [tools] GitlabCI telegraf step on salt-states-...
https://progress.opensuse.org/issues/135206
2023-09-05T20:20:38Z
osukup
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107</a></p>
<p>From log:</p>
<pre><code>openqaworker16.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
worker30.oqa.prg2.suse.org:
telegraf is fine
openqaworker17.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
openqaworker18.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
</code></pre>
<p>looks like some hosts have problem with dns:</p>
<p>openqaworker16.qa.suse.cz<br>
openqaworker17.qa.suse.cz<br>
openqaworker18.qa.suse.cz<br>
openqaworker14.qa.suse.cz<br>
qesapworker-prg4.qa.suse.cz<br>
qesapworker-prg5.qa.suse.cz<br>
qesapworker-prg7.qa.suse.cz<br>
qesapworker-prg6.qa.suse.cz<br>
openqa-monitor.qa.suse.de </p>
<p>AC1: pipeline pass </p>
openQA Infrastructure - action #134816 (Resolved): [tools] grafana dashboard for `OpenQA Jobs tes...
https://progress.opensuse.org/issues/134816
2023-08-30T08:46:38Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Dashboard <a href="https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1</a></p>
<p>missing data in graphs showing running tests from yesterday migration</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No missing data for osd on Grafana</li>
<li><strong>AC2:</strong> Alerts related to affected panels are functioning</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>In salt states in monitoring/telegraf/telegraf-webui.conf instead of <code>grains['fqdn']</code> use something like grains.get('primary_webui_domain', grains.get('fqdn'))`. Alternatively we could use the "id" in place of the FQDN</li>
<li>If the above does not work then use an OR expression since we already have data with different domains in the db (or implement that to cover the data from 2023-08-29 to today)</li>
<li>Also check whether alerts need to be covered</li>
<li>As alternative can we change the FQDN of osd to again point to openqa.suse.de
<ul>
<li>Apparently a bad idea according to mcaj (not sure why)</li>
</ul></li>
<li>See existing MR: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953</a></li>
</ul>
openQA Infrastructure - action #133154 (Resolved): osd-deployment failed because unreachable workers
https://progress.opensuse.org/issues/133154
2023-07-21T08:58:16Z
osukup
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/736743" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/736743</a></p>
<p>from logs:</p>
<pre><code>sapworker1.qe.nue2.suse.org:
Minion did not return. [Not connected]
openqaworker1.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker2.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker3.qe.nue2.suse.org:
Minion did not return. [Not connected]
+++ kill %1
</code></pre>
<p>tried to ping/ssh hosts and none of these hosts is reachable<br>
also IPMI is without any response... + this hosts have corresponding host up alert in grapahana.</p>
openQA Infrastructure - action #133127 (Resolved): Frankencampus network broken + GitlabCi failed...
https://progress.opensuse.org/issues/133127
2023-07-20T17:34:02Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Job <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816</a></p>
<p>In reality it passed but upload of artifacts failed ....</p>
<p>from logs:</p>
<pre><code>WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
FATAL: invalid argument
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
1mERROR: Job failed: exit code 1
</code></pre>
openQA Infrastructure - action #133097 (Resolved): cron on OSD (date; fetch_openqa_bugs /etc/open...
https://progress.opensuse.org/issues/133097
2023-07-20T07:45:15Z
osukup
<pre><code>Exception occured while fetching boo#1115169
Traceback (most recent call last):
File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
raise e
File "/usr/bin/fetch_openqa_bugs", line 55, in <module>
client.openqa_request("PUT", "bugs/%s" % bug_dbid, data=issue.get_dict())
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 298, in openqa_request
return self.do_request(req, retries=retries, wait=wait, parse=True)
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 238, in do_request
raise err
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 213, in do_request
request.method, resp.url, resp.status_code
openqa_client.exceptions.RequestError: ('PUT', 'https://openqa.opensuse.org/api/v1/bugs/1021', 403)
</code></pre>
<p>it could be caused by broken IDP login service ? : <a href="https://suse.slack.com/archives/C029APBKLGK/p1689838423782549" class="external">https://suse.slack.com/archives/C029APBKLGK/p1689838423782549</a></p>
openQA Infrastructure - action #132926 (Workable): OSD cron -> (fetch_openqa_bugs)> /tmp/fetch_op...
https://progress.opensuse.org/issues/132926
2023-07-18T07:56:34Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>OSD cron -> (fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log failed:</p>
<p>from traceback:</p>
<pre><code>requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/SUSE/ha-sap-terraform-deployments/issues/857 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f7439e43b38>, 'Connection to api.github.com timed out. (connect timeout=10)'))
</code></pre>
<p>fetch_openqa_bug failed when fetch issues from GitHub</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> It is understood why the error occurred</li>
<li><strong>AC2:</strong> The error does not persist</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Make sure you can login, see <a href="https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/id/openqa-service_qe_suse_de.sls#L11" class="external">https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/id/openqa-service_qe_suse_de.sls#L11</a> or ask dheidler/mkittler to do that for you</li>
<li>Assuming "host unavailable', check how long the scripts retried
<ul>
<li>Re-try more often?</li>
<li>Wait longer between attemps? </li>
</ul></li>
<li><a href="https://github.com/os-autoinst/openqa_bugfetcher" class="external">https://github.com/os-autoinst/openqa_bugfetcher</a></li>
</ul>
openQA Infrastructure - action #130132 (Resolved): jenkins.qa.suse.de seems down
https://progress.opensuse.org/issues/130132
2023-05-31T11:17:23Z
osukup
<p>Jenkins go stuck in emergency mode again ... @nsinger using Ctrl-D booted system.</p>
openQA Infrastructure - action #125132 (Resolved): [alert] logrotate failed on OSD
https://progress.opensuse.org/issues/125132
2023-02-28T09:54:59Z
osukup
<p>from journalctl:</p>
<pre><code>Feb 15 00:00:07 openqa logrotate[12569]: logrotate does not support parallel execution on the same set of logfiles.
Feb 15 00:00:07 openqa logrotate[12569]: error: state file /var/lib/misc/logrotate.status is already locked
Feb 15 00:00:00 openqa systemd[1]: Starting Rotate log files...
</code></pre>
openQA Infrastructure - action #114908 (Resolved): [tools] https://stats.openqa-monitor.qa.suse.d...
https://progress.opensuse.org/issues/114908
2022-08-02T12:17:54Z
osukup
<p>grafana overview page isn't responding .</p>
openQA Infrastructure - action #106594 (Resolved): [tools] openqaworker-arm-3 periodically fails ...
https://progress.opensuse.org/issues/106594
2022-02-10T11:36:16Z
osukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>from journalctl -xe -u os-autoinst-openvswitch</p>
<pre><code>úno 09 21:56:21 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 300s left ...
úno 09 21:56:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 299s left ...
....
úno 09 22:01:20 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 3s left ...
úno 09 22:01:21 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 2s left ...
úno 09 22:01:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: can't parse bridge local port IP at /usr/lib/os-autoinst/os-autoinst-openvswitch line 43.
úno 09 22:01:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 1s left ...
úno 09 22:01:22 openqaworker-arm-3 systemd[1]: os-autoinst-openvswitch.service: Main process exited, code=exited, status=255/EXCEPTION
</code></pre>
<p>Default timeout is 60 seconds, on openqaworker-arm-3 is now 5 minutes, but still isn't enough after system reboot</p>
<a name="Rollback-steps"></a>
<h2 >Rollback steps<a href="#Rollback-steps" class="wiki-anchor">¶</a></h2>
<ul>
<li>Unpause alert "Failed systemd services alert (except openqa.suse.de)"systemd services (</li>
</ul>
openQA Infrastructure - action #106365 (Resolved): Improve security for OSD worker credentials br...
https://progress.opensuse.org/issues/106365
2022-02-09T10:25:15Z
osukup
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a href="https://progress.opensuse.org/issues/105405" class="external">https://progress.opensuse.org/issues/105405</a> .. changed visibility of salt-pillars-openqa broke <code>deploy</code> stage of CI</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Working salt-states+salt-pillars pipelines in gitlab</li>
<li><strong>AC2:</strong> salt-pillars repo stays non-public</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Try out deploy tokens on OSD to fetch the git repo</li>
</ul>