openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-12-21T16:45:31ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #152857 (Resolved): [tools] alert ping between hosts timeout prox...https://progress.opensuse.org/issues/1528572023-12-21T16:45:31Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1</a></p>
<p>looks like proxy.scc.de is down ..</p>
<p><a href="https://suse.slack.com/archives/C029APBKLGK/p1703170652751919" class="external">https://suse.slack.com/archives/C029APBKLGK/p1703170652751919</a></p>
<p>Q: who is responsible for proxy.scc.suse.de and where is running ?</p>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Remove silence "alertname=Packet loss between worker hosts and other hosts alert" from <a href="https://monitor.qa.suse.de/alerting/silences" class="external">https://monitor.qa.suse.de/alerting/silences</a></li>
</ul>
openQA Infrastructure - action #152827 (Resolved): [tools] cron service updating clamav database ...https://progress.opensuse.org/issues/1528272023-12-21T09:17:42Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From /var/spool/mail/cvdupdate on both instances of the osd and o3 web UI:</p>
<pre><code>From cvdupdate@localhost Thu Dec 21 10:00:01 2023
Return-Path: <cvdupdate@localhost>
X-Original-To: cvdupdate
Delivered-To: cvdupdate@localhost
Received: by localhost (Postfix, from userid 17307)
id BC86134590; Thu, 21 Dec 2023 10:00:01 +0100 (CET)
From: "(Cron Daemon)" <cvdupdate@localhost>
To: cvdupdate@localhost
Subject: Cron <cvdupdate@openqa> /home/cvdupdate/.local/bin/cvdupdate update
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: <XDG_SESSION_ID=c21238>
X-Cron-Env: <XDG_RUNTIME_DIR=/run/user/17307>
X-Cron-Env: <DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/17307/bus>
X-Cron-Env: <XDG_SESSION_TYPE=unspecified>
X-Cron-Env: <XDG_SESSION_CLASS=background>
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/cvdupdate>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=cvdupdate>
X-Cron-Env: <USER=cvdupdate>
Message-Id: <20231221090001.BC86134590@localhost>
Date: Thu, 21 Dec 2023 10:00:01 +0100 (CET)
Traceback (most recent call last):
File "/home/cvdupdate/.local/bin/cvdupdate", line 11, in <module>
sys.exit(cli())
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 259, in update_alias
ctx.forward(db_update)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 628, in forward
return self.invoke(cmd, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 100, in db_update
m = CVDUpdate(config=config, verbose=verbose)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 119, in __init__
nameserver)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 184, in _read_config
self.config = json.load(config_file)
File "/usr/lib64/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Clamav is no longer raising exceptions on o3</li>
<li><strong>AC2:</strong> Clamav is no longer raising exceptions on osd</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Remove clamav service - install on o3, remove from salt on osd</li>
</ul>
openQA Infrastructure - action #152741 (Resolved): [tools] gitlab CI - openqa_review failed with ...https://progress.opensuse.org/issues/1527412023-12-18T15:46:57Zosukup
<p>Looks like osd wasn't able to reply to api longer than 30 sec, can be a random network problem or too complicated query? </p>
<p><a href="https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520" class="external">https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520</a></p>
<pre><code>usr/bin/openqa-review --host https://openqa.suse.de -n -r -T --query-issue-status --no-empty-sections --include-softfails --running-threshold=2 --exclude-job-groups '^(Released|Development|old|EOL)' --reminder-comment-on-issues --save --save-dir /tmp/tmp.1LmmaKoNx7 --job-groups '^SLE.*15.*(Functional)'
..............................WARNING:urllib3.connectionpool:Retrying (Retry(total=6, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..................................WARNING:urllib3.connectionpool:Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
......................................WARNING:urllib3.connectionpool:Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
...................................................................................................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................WARNING:openqa_review.browser:Request to https://openqa.suse.de/api/v1/parent_groups was not successful after 7 retries: HTTPSConnectionPool(host='openqa.suse.de', port=443): Max retries exceeded with url: /api/v1/parent_groups (Caused by ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)"))
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
conn.connect()
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 642, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 782, in _ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 470, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 514, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 1108, in _create
self.do_handshake()
File "/usr/lib64/python3.11/ssl.py", line 1379, in do_handshake
self._sslobj.do_handshake()
TimeoutError: _ssl.c:989: The handshake operation timed out
The above exception was the direct cause of the following exception:
</code></pre> openQA Infrastructure - action #135335 (Resolved): [tools] gitlabci salt-pillars-openqa deploy f...https://progress.opensuse.org/issues/1353352023-09-07T08:04:46Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907</a></p>
<p>from log:</p>
<pre><code> ID: wicked ifup br1
Function: cmd.run
Result: False
Comment: Command "wicked ifup br1" run
Started: 07:07:10.803006
Duration: 30119.464 ms
Changes:
----------
pid:
16955
retcode:
157
stderr:
stdout:
br1 no-device
Name: /etc/sysconfig/network/ifcfg-tap0 - Function: file.managed - Result: Clean Started: - 07:07:40.934647 Duration: 7.021 ms
Name: /etc/sysconfig/network/ifcfg-tap64 - Function: file.managed - Result: Clean Started: - 07:07:40.945434 Duration: 5.152 ms
Name: /etc/sysconfig/network/ifcfg-tap128 - Function: file.managed - Result: Clean Started: - 07:07:40.954239 Duration: 5.0 ms
Name: /etc/sysconfig/network/ifcfg-tap1 - Function: file.managed - Result: Clean Started: - 07:07:40.962885 Duration: 5.013 ms
Name: /etc/sysconfig/network/ifcfg-tap65 - Function: file.managed - Result: Clean Started: - 07:07:40.971581 Duration: 5.002 ms
Name: /etc/sysconfig/network/ifcfg-tap129 - Function: file.managed - Result: Clean Started: - 07:07:40.980234 Duration: 4.984 ms
Name: /etc/sysconfig/network/ifcfg-tap2 - Function: file.managed - Result: Clean Started: - 07:07:40.988915 Duration: 5.037 ms
Name: /etc/sysconfig/network/ifcfg-tap66 - Function: file.managed - Result: Clean Started: - 07:07:40.997624 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap130 - Function: file.managed - Result: Clean Started: - 07:07:41.006362 Duration: 4.946 ms
Name: /etc/sysconfig/network/ifcfg-tap3 - Function: file.managed - Result: Clean Started: - 07:07:41.015198 Duration: 5.266 ms
Name: /etc/sysconfig/network/ifcfg-tap67 - Function: file.managed - Result: Clean Started: - 07:07:41.024299 Duration: 5.178 ms
Name: /etc/sysconfig/network/ifcfg-tap131 - Function: file.managed - Result: Clean Started: - 07:07:41.033154 Duration: 4.992 ms
Name: /etc/sysconfig/network/ifcfg-tap4 - Function: file.managed - Result: Clean Started: - 07:07:41.041806 Duration: 4.955 ms
Name: /etc/sysconfig/network/ifcfg-tap68 - Function: file.managed - Result: Clean Started: - 07:07:41.050532 Duration: 5.287 ms
Name: /etc/sysconfig/network/ifcfg-tap132 - Function: file.managed - Result: Clean Started: - 07:07:41.059443 Duration: 4.926 ms
Name: /etc/sysconfig/network/ifcfg-tap5 - Function: file.managed - Result: Clean Started: - 07:07:41.068081 Duration: 4.993 ms
Name: /etc/sysconfig/network/ifcfg-tap69 - Function: file.managed - Result: Clean Started: - 07:07:41.076758 Duration: 4.93 ms
Name: /etc/sysconfig/network/ifcfg-tap133 - Function: file.managed - Result: Clean Started: - 07:07:41.085353 Duration: 4.942 ms
Name: /etc/sysconfig/network/ifcfg-tap6 - Function: file.managed - Result: Clean Started: - 07:07:41.093943 Duration: 5.056 ms
Name: /etc/sysconfig/network/ifcfg-tap70 - Function: file.managed - Result: Clean Started: - 07:07:41.102645 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap134 - Function: file.managed - Result: Clean Started: - 07:07:41.111287 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap7 - Function: file.managed - Result: Clean Started: - 07:07:41.119942 Duration: 4.913 ms
Name: /etc/sysconfig/network/ifcfg-tap71 - Function: file.managed - Result: Clean Started: - 07:07:41.128614 Duration: 4.959 ms
Name: /etc/sysconfig/network/ifcfg-tap135 - Function: file.managed - Result: Clean Started: - 07:07:41.137410 Duration: 4.953 ms
Name: /etc/sysconfig/network/ifcfg-tap8 - Function: file.managed - Result: Clean Started: - 07:07:41.146176 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap72 - Function: file.managed - Result: Clean Started: - 07:07:41.154807 Duration: 5.035 ms
Name: /etc/sysconfig/network/ifcfg-tap136 - Function: file.managed - Result: Clean Started: - 07:07:41.163660 Duration: 4.937 ms
Name: /etc/sysconfig/network/ifcfg-tap9 - Function: file.managed - Result: Clean Started: - 07:07:41.172266 Duration: 4.954 ms
Name: /etc/sysconfig/network/ifcfg-tap73 - Function: file.managed - Result: Clean Started: - 07:07:41.181001 Duration: 4.95 ms
Name: /etc/sysconfig/network/ifcfg-tap137 - Function: file.managed - Result: Clean Started: - 07:07:41.189605 Duration: 5.503 ms
</code></pre>
<a name="Acceptance-Criteria"></a>
<h2 >Acceptance Criteria<a href="#Acceptance-Criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Salt states apply successfully on imageworker</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate if one of the service defs is missing a "requires" or similar</li>
<li>Commands were re-run - consider persistent mitigations if this is causing other pipelines to fail</li>
<li>This seems to affect imagetester, openqaworker17.qa.suse.cz, openqaworker16.qa.suse.cz and openqaworker18.qa.suse.cz so far</li>
</ul>
openQA Infrastructure - action #135206 (Rejected): [tools] GitlabCI telegraf step on salt-states-...https://progress.opensuse.org/issues/1352062023-09-05T20:20:38Zosukup
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107</a></p>
<p>From log:</p>
<pre><code>openqaworker16.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
worker30.oqa.prg2.suse.org:
telegraf is fine
openqaworker17.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
openqaworker18.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
</code></pre>
<p>looks like some hosts have problem with dns:</p>
<p>openqaworker16.qa.suse.cz<br>
openqaworker17.qa.suse.cz<br>
openqaworker18.qa.suse.cz<br>
openqaworker14.qa.suse.cz<br>
qesapworker-prg4.qa.suse.cz<br>
qesapworker-prg5.qa.suse.cz<br>
qesapworker-prg7.qa.suse.cz<br>
qesapworker-prg6.qa.suse.cz<br>
openqa-monitor.qa.suse.de </p>
<p>AC1: pipeline pass </p>
openQA Infrastructure - action #134816 (Resolved): [tools] grafana dashboard for `OpenQA Jobs tes...https://progress.opensuse.org/issues/1348162023-08-30T08:46:38Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Dashboard <a href="https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1</a></p>
<p>missing data in graphs showing running tests from yesterday migration</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No missing data for osd on Grafana</li>
<li><strong>AC2:</strong> Alerts related to affected panels are functioning</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>In salt states in monitoring/telegraf/telegraf-webui.conf instead of <code>grains['fqdn']</code> use something like grains.get('primary_webui_domain', grains.get('fqdn'))`. Alternatively we could use the "id" in place of the FQDN</li>
<li>If the above does not work then use an OR expression since we already have data with different domains in the db (or implement that to cover the data from 2023-08-29 to today)</li>
<li>Also check whether alerts need to be covered</li>
<li>As alternative can we change the FQDN of osd to again point to openqa.suse.de
<ul>
<li>Apparently a bad idea according to mcaj (not sure why)</li>
</ul></li>
<li>See existing MR: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953</a></li>
</ul>
openQA Infrastructure - action #133127 (Resolved): Frankencampus network broken + GitlabCi failed...https://progress.opensuse.org/issues/1331272023-07-20T17:34:02Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Job <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816</a></p>
<p>In reality it passed but upload of artifacts failed ....</p>
<p>from logs:</p>
<pre><code>WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
FATAL: invalid argument
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
1mERROR: Job failed: exit code 1
</code></pre> openQA Infrastructure - action #133097 (Resolved): cron on OSD (date; fetch_openqa_bugs /etc/open...https://progress.opensuse.org/issues/1330972023-07-20T07:45:15Zosukup
<pre><code>Exception occured while fetching boo#1115169
Traceback (most recent call last):
File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
raise e
File "/usr/bin/fetch_openqa_bugs", line 55, in <module>
client.openqa_request("PUT", "bugs/%s" % bug_dbid, data=issue.get_dict())
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 298, in openqa_request
return self.do_request(req, retries=retries, wait=wait, parse=True)
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 238, in do_request
raise err
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 213, in do_request
request.method, resp.url, resp.status_code
openqa_client.exceptions.RequestError: ('PUT', 'https://openqa.opensuse.org/api/v1/bugs/1021', 403)
</code></pre>
<p>it could be caused by broken IDP login service ? : <a href="https://suse.slack.com/archives/C029APBKLGK/p1689838423782549" class="external">https://suse.slack.com/archives/C029APBKLGK/p1689838423782549</a></p>
openQA Infrastructure - action #132860 (Resolved): openqa-piworker is unstable and needs regular ...https://progress.opensuse.org/issues/1328602023-07-17T08:39:49Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1694765" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1694765</a></p>
<p>only thing found in logs:<br>
salt_ping.log:</p>
<pre><code>Currently the following minions are down:
8d7
< "openqa-piworker.qa.suse.de"
===================
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> we are able to process openQA Raspberry Pi bare-metal jobs consistently over some days</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><p>Identify the cause for regression</p>
<ul>
<li>likely something related to the hardware RTC</li>
<li>try if it just works with Leap 15.5 because we wanted to upgrade anyway</li>
<li>could be a recent kernel update so try to downgrade</li>
</ul></li>
<li><p>If it is really necessary and you exhausted all other remote-controllable options then go to the office, unplug RTC, reinstall the system assuming it was a borked system and corruption, or whatever</p></li>
<li><p>As Plan Y (if options A to X failed) buy wifi&bluetooth adapter for a IPMI controllable server and use that instead to connect to the rpi bare metal test instances</p></li>
</ul>
<a name="Rollback-steps"></a>
<h2 >Rollback steps<a href="#Rollback-steps" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add back salt key with <code>ssh osd "sudo salt-key -y -a openqa-piworker.qa.suse.de"</code></li>
</ul>
openQA Infrastructure - action #130132 (Resolved): jenkins.qa.suse.de seems downhttps://progress.opensuse.org/issues/1301322023-05-31T11:17:23Zosukup
<p>Jenkins go stuck in emergency mode again ... @nsinger using Ctrl-D booted system.</p>
openQA Infrastructure - action #125132 (Resolved): [alert] logrotate failed on OSDhttps://progress.opensuse.org/issues/1251322023-02-28T09:54:59Zosukup
<p>from journalctl:</p>
<pre><code>Feb 15 00:00:07 openqa logrotate[12569]: logrotate does not support parallel execution on the same set of logfiles.
Feb 15 00:00:07 openqa logrotate[12569]: error: state file /var/lib/misc/logrotate.status is already locked
Feb 15 00:00:00 openqa systemd[1]: Starting Rotate log files...
</code></pre> openQA Infrastructure - action #114908 (Resolved): [tools] https://stats.openqa-monitor.qa.suse.d...https://progress.opensuse.org/issues/1149082022-08-02T12:17:54Zosukup
<p>grafana overview page isn't responding .</p>
openQA Infrastructure - action #106594 (Resolved): [tools] openqaworker-arm-3 periodically fails ...https://progress.opensuse.org/issues/1065942022-02-10T11:36:16Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>from journalctl -xe -u os-autoinst-openvswitch</p>
<pre><code>úno 09 21:56:21 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 300s left ...
úno 09 21:56:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 299s left ...
....
úno 09 22:01:20 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 3s left ...
úno 09 22:01:21 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 2s left ...
úno 09 22:01:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: can't parse bridge local port IP at /usr/lib/os-autoinst/os-autoinst-openvswitch line 43.
úno 09 22:01:22 openqaworker-arm-3 os-autoinst-openvswitch[2924]: Waiting for IP on bridge 'br1', 1s left ...
úno 09 22:01:22 openqaworker-arm-3 systemd[1]: os-autoinst-openvswitch.service: Main process exited, code=exited, status=255/EXCEPTION
</code></pre>
<p>Default timeout is 60 seconds, on openqaworker-arm-3 is now 5 minutes, but still isn't enough after system reboot</p>
<a name="Rollback-steps"></a>
<h2 >Rollback steps<a href="#Rollback-steps" class="wiki-anchor">¶</a></h2>
<ul>
<li>Unpause alert "Failed systemd services alert (except openqa.suse.de)"systemd services (</li>
</ul>
openQA Infrastructure - action #106365 (Resolved): Improve security for OSD worker credentials br...https://progress.opensuse.org/issues/1063652022-02-09T10:25:15Zosukup
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a href="https://progress.opensuse.org/issues/105405" class="external">https://progress.opensuse.org/issues/105405</a> .. changed visibility of salt-pillars-openqa broke <code>deploy</code> stage of CI</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Working salt-states+salt-pillars pipelines in gitlab</li>
<li><strong>AC2:</strong> salt-pillars repo stays non-public</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Try out deploy tokens on OSD to fetch the git repo</li>
</ul>
openQA Infrastructure - action #106035 (Rejected): [qe-tools] dehydrated service fails on osdhttps://progress.opensuse.org/issues/1060352022-02-07T08:09:49Zosukup
<p>OSD has systemd in degraded state because system service dehydrated ends in failed state ..</p>
<pre><code>dehydrated.service - Certificate Update Runner for Dehydrated
Loaded: loaded (/usr/lib/systemd/system/dehydrated.service; static)
Active: failed (Result: exit-code) since Mon 2022-02-07 09:03:35 CET; 4min 58s ago
TriggeredBy: ● dehydrated.timer
Process: 26947 ExecStart=/usr/bin/dehydrated --cron (code=exited, status=1/FAILURE)
Main PID: 26947 (code=exited, status=1/FAILURE)
Feb 07 09:03:34 openqa systemd[1]: Starting Certificate Update Runner for Dehydrated...
Feb 07 09:03:34 openqa dehydrated[26947]: # INFO: Using main config file /etc/dehydrated/config
Feb 07 09:03:34 openqa dehydrated[26947]: # INFO: Using additional config file /etc/dehydrated/config.d/suse-ca.sh
Feb 07 09:03:34 openqa dehydrated[26947]: # INFO: Running /usr/bin/dehydrated as dehydrated/dehydrated
Feb 07 09:03:34 openqa sudo[26947]: root : PWD=/ ; USER=dehydrated ; GROUP=dehydrated ; COMMAND=/usr/bin/dehydrated --cron
Feb 07 09:03:35 openqa dehydrated[27267]: {}
Feb 07 09:03:35 openqa systemd[1]: dehydrated.service: Main process exited, code=exited, status=1/FAILURE
Feb 07 09:03:35 openqa systemd[1]: dehydrated.service: Failed with result 'exit-code'.
Feb 07 09:03:35 openqa systemd[1]: Failed to start Certificate Update Runner for Dehydrated.
</code></pre>