openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-12-21T16:45:31ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #152857 (Resolved): [tools] alert ping between hosts timeout prox...https://progress.opensuse.org/issues/1528572023-12-21T16:45:31Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1</a></p>
<p>looks like proxy.scc.de is down ..</p>
<p><a href="https://suse.slack.com/archives/C029APBKLGK/p1703170652751919" class="external">https://suse.slack.com/archives/C029APBKLGK/p1703170652751919</a></p>
<p>Q: who is responsible for proxy.scc.suse.de and where is running ?</p>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Remove silence "alertname=Packet loss between worker hosts and other hosts alert" from <a href="https://monitor.qa.suse.de/alerting/silences" class="external">https://monitor.qa.suse.de/alerting/silences</a></li>
</ul>
openQA Infrastructure - action #152827 (Resolved): [tools] cron service updating clamav database ...https://progress.opensuse.org/issues/1528272023-12-21T09:17:42Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From /var/spool/mail/cvdupdate on both instances of the osd and o3 web UI:</p>
<pre><code>From cvdupdate@localhost Thu Dec 21 10:00:01 2023
Return-Path: <cvdupdate@localhost>
X-Original-To: cvdupdate
Delivered-To: cvdupdate@localhost
Received: by localhost (Postfix, from userid 17307)
id BC86134590; Thu, 21 Dec 2023 10:00:01 +0100 (CET)
From: "(Cron Daemon)" <cvdupdate@localhost>
To: cvdupdate@localhost
Subject: Cron <cvdupdate@openqa> /home/cvdupdate/.local/bin/cvdupdate update
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: <XDG_SESSION_ID=c21238>
X-Cron-Env: <XDG_RUNTIME_DIR=/run/user/17307>
X-Cron-Env: <DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/17307/bus>
X-Cron-Env: <XDG_SESSION_TYPE=unspecified>
X-Cron-Env: <XDG_SESSION_CLASS=background>
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/cvdupdate>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=cvdupdate>
X-Cron-Env: <USER=cvdupdate>
Message-Id: <20231221090001.BC86134590@localhost>
Date: Thu, 21 Dec 2023 10:00:01 +0100 (CET)
Traceback (most recent call last):
File "/home/cvdupdate/.local/bin/cvdupdate", line 11, in <module>
sys.exit(cli())
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 259, in update_alias
ctx.forward(db_update)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 628, in forward
return self.invoke(cmd, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 100, in db_update
m = CVDUpdate(config=config, verbose=verbose)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 119, in __init__
nameserver)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 184, in _read_config
self.config = json.load(config_file)
File "/usr/lib64/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Clamav is no longer raising exceptions on o3</li>
<li><strong>AC2:</strong> Clamav is no longer raising exceptions on osd</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Remove clamav service - install on o3, remove from salt on osd</li>
</ul>
openQA Infrastructure - action #152741 (Resolved): [tools] gitlab CI - openqa_review failed with ...https://progress.opensuse.org/issues/1527412023-12-18T15:46:57Zosukup
<p>Looks like osd wasn't able to reply to api longer than 30 sec, can be a random network problem or too complicated query? </p>
<p><a href="https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520" class="external">https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520</a></p>
<pre><code>usr/bin/openqa-review --host https://openqa.suse.de -n -r -T --query-issue-status --no-empty-sections --include-softfails --running-threshold=2 --exclude-job-groups '^(Released|Development|old|EOL)' --reminder-comment-on-issues --save --save-dir /tmp/tmp.1LmmaKoNx7 --job-groups '^SLE.*15.*(Functional)'
..............................WARNING:urllib3.connectionpool:Retrying (Retry(total=6, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..................................WARNING:urllib3.connectionpool:Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
......................................WARNING:urllib3.connectionpool:Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
...................................................................................................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................WARNING:openqa_review.browser:Request to https://openqa.suse.de/api/v1/parent_groups was not successful after 7 retries: HTTPSConnectionPool(host='openqa.suse.de', port=443): Max retries exceeded with url: /api/v1/parent_groups (Caused by ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)"))
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
conn.connect()
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 642, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 782, in _ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 470, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 514, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 1108, in _create
self.do_handshake()
File "/usr/lib64/python3.11/ssl.py", line 1379, in do_handshake
self._sslobj.do_handshake()
TimeoutError: _ssl.c:989: The handshake operation timed out
The above exception was the direct cause of the following exception:
</code></pre> openQA Project - action #135407 (Resolved): [tools] Measure to mitigate websockets overload by wo...https://progress.opensuse.org/issues/1354072023-09-08T11:39:06Zosukup
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Consolidate all steps we took to mitigate <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert (Resolved)" href="https://progress.opensuse.org/issues/135122">#135122</a> and how to revert it.</p>
<p>1) stopped workers:</p>
<p>used:<br>
<code>sudo salt 'worker3[1,2,3,4,5,6]*' cmd.run 'sudo systemctl disable --now telegraf $(systemctl list-units | grep openqa-worker-auto-restart | cut -d "." -f 1 | xargs)'\<br>
&& for i in {1..6}; do sudo salt-key -y -d "worker3$i*"; done</code></p>
<p>revert:<br>
<code>for i in {1..6}; do sudo salt-key -y -a "worker3$i*";done && sudo salt 'worker3[1,2,3,4,5,6]*' state.apply</code></p>
<p>2) Lowered amount workers</p>
<p>used:<br>
<a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/606" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/606</a></p>
<p>revert: <br>
revert mentioned MR in GitLab</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Ensure step #1 has been reverted</li>
<li><strong>AC2</strong>: DONE Ensure step #2 has been reverted</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Maybe don't bring them all back at once (and be prepared to remove them again in case of new performance issues)</li>
<li>In case of new performance issues make sure to strace the openqa-scheduler and openqa-websockets processes</li>
</ul>
openQA Infrastructure - action #135335 (Resolved): [tools] gitlabci salt-pillars-openqa deploy f...https://progress.opensuse.org/issues/1353352023-09-07T08:04:46Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907</a></p>
<p>from log:</p>
<pre><code> ID: wicked ifup br1
Function: cmd.run
Result: False
Comment: Command "wicked ifup br1" run
Started: 07:07:10.803006
Duration: 30119.464 ms
Changes:
----------
pid:
16955
retcode:
157
stderr:
stdout:
br1 no-device
Name: /etc/sysconfig/network/ifcfg-tap0 - Function: file.managed - Result: Clean Started: - 07:07:40.934647 Duration: 7.021 ms
Name: /etc/sysconfig/network/ifcfg-tap64 - Function: file.managed - Result: Clean Started: - 07:07:40.945434 Duration: 5.152 ms
Name: /etc/sysconfig/network/ifcfg-tap128 - Function: file.managed - Result: Clean Started: - 07:07:40.954239 Duration: 5.0 ms
Name: /etc/sysconfig/network/ifcfg-tap1 - Function: file.managed - Result: Clean Started: - 07:07:40.962885 Duration: 5.013 ms
Name: /etc/sysconfig/network/ifcfg-tap65 - Function: file.managed - Result: Clean Started: - 07:07:40.971581 Duration: 5.002 ms
Name: /etc/sysconfig/network/ifcfg-tap129 - Function: file.managed - Result: Clean Started: - 07:07:40.980234 Duration: 4.984 ms
Name: /etc/sysconfig/network/ifcfg-tap2 - Function: file.managed - Result: Clean Started: - 07:07:40.988915 Duration: 5.037 ms
Name: /etc/sysconfig/network/ifcfg-tap66 - Function: file.managed - Result: Clean Started: - 07:07:40.997624 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap130 - Function: file.managed - Result: Clean Started: - 07:07:41.006362 Duration: 4.946 ms
Name: /etc/sysconfig/network/ifcfg-tap3 - Function: file.managed - Result: Clean Started: - 07:07:41.015198 Duration: 5.266 ms
Name: /etc/sysconfig/network/ifcfg-tap67 - Function: file.managed - Result: Clean Started: - 07:07:41.024299 Duration: 5.178 ms
Name: /etc/sysconfig/network/ifcfg-tap131 - Function: file.managed - Result: Clean Started: - 07:07:41.033154 Duration: 4.992 ms
Name: /etc/sysconfig/network/ifcfg-tap4 - Function: file.managed - Result: Clean Started: - 07:07:41.041806 Duration: 4.955 ms
Name: /etc/sysconfig/network/ifcfg-tap68 - Function: file.managed - Result: Clean Started: - 07:07:41.050532 Duration: 5.287 ms
Name: /etc/sysconfig/network/ifcfg-tap132 - Function: file.managed - Result: Clean Started: - 07:07:41.059443 Duration: 4.926 ms
Name: /etc/sysconfig/network/ifcfg-tap5 - Function: file.managed - Result: Clean Started: - 07:07:41.068081 Duration: 4.993 ms
Name: /etc/sysconfig/network/ifcfg-tap69 - Function: file.managed - Result: Clean Started: - 07:07:41.076758 Duration: 4.93 ms
Name: /etc/sysconfig/network/ifcfg-tap133 - Function: file.managed - Result: Clean Started: - 07:07:41.085353 Duration: 4.942 ms
Name: /etc/sysconfig/network/ifcfg-tap6 - Function: file.managed - Result: Clean Started: - 07:07:41.093943 Duration: 5.056 ms
Name: /etc/sysconfig/network/ifcfg-tap70 - Function: file.managed - Result: Clean Started: - 07:07:41.102645 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap134 - Function: file.managed - Result: Clean Started: - 07:07:41.111287 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap7 - Function: file.managed - Result: Clean Started: - 07:07:41.119942 Duration: 4.913 ms
Name: /etc/sysconfig/network/ifcfg-tap71 - Function: file.managed - Result: Clean Started: - 07:07:41.128614 Duration: 4.959 ms
Name: /etc/sysconfig/network/ifcfg-tap135 - Function: file.managed - Result: Clean Started: - 07:07:41.137410 Duration: 4.953 ms
Name: /etc/sysconfig/network/ifcfg-tap8 - Function: file.managed - Result: Clean Started: - 07:07:41.146176 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap72 - Function: file.managed - Result: Clean Started: - 07:07:41.154807 Duration: 5.035 ms
Name: /etc/sysconfig/network/ifcfg-tap136 - Function: file.managed - Result: Clean Started: - 07:07:41.163660 Duration: 4.937 ms
Name: /etc/sysconfig/network/ifcfg-tap9 - Function: file.managed - Result: Clean Started: - 07:07:41.172266 Duration: 4.954 ms
Name: /etc/sysconfig/network/ifcfg-tap73 - Function: file.managed - Result: Clean Started: - 07:07:41.181001 Duration: 4.95 ms
Name: /etc/sysconfig/network/ifcfg-tap137 - Function: file.managed - Result: Clean Started: - 07:07:41.189605 Duration: 5.503 ms
</code></pre>
<a name="Acceptance-Criteria"></a>
<h2 >Acceptance Criteria<a href="#Acceptance-Criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Salt states apply successfully on imageworker</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate if one of the service defs is missing a "requires" or similar</li>
<li>Commands were re-run - consider persistent mitigations if this is causing other pipelines to fail</li>
<li>This seems to affect imagetester, openqaworker17.qa.suse.cz, openqaworker16.qa.suse.cz and openqaworker18.qa.suse.cz so far</li>
</ul>
openQA Infrastructure - action #135206 (Rejected): [tools] GitlabCI telegraf step on salt-states-...https://progress.opensuse.org/issues/1352062023-09-05T20:20:38Zosukup
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/pipelines/791107</a></p>
<p>From log:</p>
<pre><code>openqaworker16.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
worker30.oqa.prg2.suse.org:
telegraf is fine
openqaworker17.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
openqaworker18.qa.suse.cz:
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "walter1.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: walter1.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [inputs.ping] Error in plugin: host "qa-jump.qe.nue2.suse.org": exit status 2 - /usr/bin/ping: qa-jump.qe.nue2.suse.org: Name or service not known
2023-09-05T20:00:54Z E! [telegraf] Error running agent: input plugins recorded 2 errors
</code></pre>
<p>looks like some hosts have problem with dns:</p>
<p>openqaworker16.qa.suse.cz<br>
openqaworker17.qa.suse.cz<br>
openqaworker18.qa.suse.cz<br>
openqaworker14.qa.suse.cz<br>
qesapworker-prg4.qa.suse.cz<br>
qesapworker-prg5.qa.suse.cz<br>
qesapworker-prg7.qa.suse.cz<br>
qesapworker-prg6.qa.suse.cz<br>
openqa-monitor.qa.suse.de </p>
<p>AC1: pipeline pass </p>
openQA Tests - action #135143 (Resolved): [tools] test fails in openqa_from_git -> dashboard size...https://progress.opensuse.org/issues/1351432023-09-04T14:31:53Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario openqa-Tumbleweed-dev-x86_64-openqa_from_git@64bit-2G fails in<br>
<a href="https://openqa.opensuse.org/tests/3547791#step/dashboard/3" class="external">dashboard</a></p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>test fails in <code>ensure_unlocked_desktop</code> .. send first click in login screen to show password prompt but not type password.<br>
I tried create needle -> needle correctly saved, but it looks like <code>ensure_unlocked_desktop</code> has problem in main loop :(</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.opensuse.org/tests/3547791" class="external">:TW.22712</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.opensuse.org/tests/3547742" class="external">:TW.22711</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=openqa&flavor=dev&machine=64bit-2G&test=openqa_from_git&version=Tumbleweed" class="external">latest</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<p>AC1: test successfully unlocks desktop </p>
openQA Project - action #135134 (Resolved): [tools] GitlabCI salt-pillars-openqa deploy failed o...https://progress.opensuse.org/issues/1351342023-09-04T09:48:02Zosukup
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1803184" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1803184</a></p>
<pre><code>baremetal-support.qa.suse.de:
Data failed to compile:
----------
Rendering SLS 'base:debug_poo133469' failed: mapping values are not allowed here; line 13
---
[...]
attempts: 5
minion_cmd_file:
file.patch:
- name: warning: waiting for shared lock on /usr/lib/sysimage/rpm/Packages <======================
error: cannot get shared lock on /usr/lib/sysimage/rpm/Packages
error: cannot open Packages index using db4 - Operation not permitted (1)
error: cannot open Packages database in /usr/lib/sysimage/rpm
warning: waiting for shared lock on /usr/lib/sysimage/rpm/Packages
error: cannot get shared lock on /usr/lib/sysimage/rpm/Packages
[...]
---
section_end:1693819642:step_script
~~~
</code></pre> openQA Infrastructure - action #134816 (Resolved): [tools] grafana dashboard for `OpenQA Jobs tes...https://progress.opensuse.org/issues/1348162023-08-30T08:46:38Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Dashboard <a href="https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1</a></p>
<p>missing data in graphs showing running tests from yesterday migration</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No missing data for osd on Grafana</li>
<li><strong>AC2:</strong> Alerts related to affected panels are functioning</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>In salt states in monitoring/telegraf/telegraf-webui.conf instead of <code>grains['fqdn']</code> use something like grains.get('primary_webui_domain', grains.get('fqdn'))`. Alternatively we could use the "id" in place of the FQDN</li>
<li>If the above does not work then use an OR expression since we already have data with different domains in the db (or implement that to cover the data from 2023-08-29 to today)</li>
<li>Also check whether alerts need to be covered</li>
<li>As alternative can we change the FQDN of osd to again point to openqa.suse.de
<ul>
<li>Apparently a bad idea according to mcaj (not sure why)</li>
</ul></li>
<li>See existing MR: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953</a></li>
</ul>
openQA Project - action #134810 (Rejected): [tools] GitlabCI deploy on salt-states-openqa took to...https://progress.opensuse.org/issues/1348102023-08-30T07:50:27Zosukup
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1790871" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1790871</a></p>
<p>gitlab CI limit for single job is 2 hours, job took more..</p>
openQA Infrastructure - action #133154 (Resolved): osd-deployment failed because unreachable workershttps://progress.opensuse.org/issues/1331542023-07-21T08:58:16Zosukup
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/736743" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/pipelines/736743</a></p>
<p>from logs:</p>
<pre><code>sapworker1.qe.nue2.suse.org:
Minion did not return. [Not connected]
openqaworker1.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker2.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker3.qe.nue2.suse.org:
Minion did not return. [Not connected]
+++ kill %1
</code></pre>
<p>tried to ping/ssh hosts and none of these hosts is reachable<br>
also IPMI is without any response... + this hosts have corresponding host up alert in grapahana.</p>
openQA Infrastructure - action #133127 (Resolved): Frankencampus network broken + GitlabCi failed...https://progress.opensuse.org/issues/1331272023-07-20T17:34:02Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Job <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipelines/735816</a></p>
<p>In reality it passed but upload of artifacts failed ....</p>
<p>from logs:</p>
<pre><code>WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
WARNING: Retrying... context=artifacts-uploader error=invalid argument
WARNING: Uploading artifacts as "archive" to coordinator... 502 Bad Gateway id=1702329 responseStatus=502 Bad Gateway status=502 token=64_L_XM4
FATAL: invalid argument
Cleaning up project directory and file based variables
00:01
ERROR: Job failed: exit code 1
1mERROR: Job failed: exit code 1
</code></pre> openQA Infrastructure - action #133097 (Resolved): cron on OSD (date; fetch_openqa_bugs /etc/open...https://progress.opensuse.org/issues/1330972023-07-20T07:45:15Zosukup
<pre><code>Exception occured while fetching boo#1115169
Traceback (most recent call last):
File "/usr/bin/fetch_openqa_bugs", line 62, in <module>
raise e
File "/usr/bin/fetch_openqa_bugs", line 55, in <module>
client.openqa_request("PUT", "bugs/%s" % bug_dbid, data=issue.get_dict())
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 298, in openqa_request
return self.do_request(req, retries=retries, wait=wait, parse=True)
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 238, in do_request
raise err
File "/usr/lib/python3.6/site-packages/openqa_client/client.py", line 213, in do_request
request.method, resp.url, resp.status_code
openqa_client.exceptions.RequestError: ('PUT', 'https://openqa.opensuse.org/api/v1/bugs/1021', 403)
</code></pre>
<p>it could be caused by broken IDP login service ? : <a href="https://suse.slack.com/archives/C029APBKLGK/p1689838423782549" class="external">https://suse.slack.com/archives/C029APBKLGK/p1689838423782549</a></p>
openQA Infrastructure - action #132926 (Workable): OSD cron -> (fetch_openqa_bugs)> /tmp/fetch_op...https://progress.opensuse.org/issues/1329262023-07-18T07:56:34Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>OSD cron -> (fetch_openqa_bugs)> /tmp/fetch_openqa_bugs_osd.log failed:</p>
<p>from traceback:</p>
<pre><code>requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /repos/SUSE/ha-sap-terraform-deployments/issues/857 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f7439e43b38>, 'Connection to api.github.com timed out. (connect timeout=10)'))
</code></pre>
<p>fetch_openqa_bug failed when fetch issues from GitHub</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> It is understood why the error occurred</li>
<li><strong>AC2:</strong> The error does not persist</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Make sure you can login, see <a href="https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/id/openqa-service_qe_suse_de.sls#L11" class="external">https://gitlab.suse.de/OPS-Service/salt/-/blob/production/pillar/id/openqa-service_qe_suse_de.sls#L11</a> or ask dheidler/mkittler to do that for you</li>
<li>Assuming "host unavailable', check how long the scripts retried
<ul>
<li>Re-try more often?</li>
<li>Wait longer between attemps? </li>
</ul></li>
<li><a href="https://github.com/os-autoinst/openqa_bugfetcher" class="external">https://github.com/os-autoinst/openqa_bugfetcher</a></li>
</ul>
openQA Infrastructure - action #132860 (Resolved): openqa-piworker is unstable and needs regular ...https://progress.opensuse.org/issues/1328602023-07-17T08:39:49Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1694765" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1694765</a></p>
<p>only thing found in logs:<br>
salt_ping.log:</p>
<pre><code>Currently the following minions are down:
8d7
< "openqa-piworker.qa.suse.de"
===================
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> we are able to process openQA Raspberry Pi bare-metal jobs consistently over some days</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><p>Identify the cause for regression</p>
<ul>
<li>likely something related to the hardware RTC</li>
<li>try if it just works with Leap 15.5 because we wanted to upgrade anyway</li>
<li>could be a recent kernel update so try to downgrade</li>
</ul></li>
<li><p>If it is really necessary and you exhausted all other remote-controllable options then go to the office, unplug RTC, reinstall the system assuming it was a borked system and corruption, or whatever</p></li>
<li><p>As Plan Y (if options A to X failed) buy wifi&bluetooth adapter for a IPMI controllable server and use that instead to connect to the rpi bare metal test instances</p></li>
</ul>
<a name="Rollback-steps"></a>
<h2 >Rollback steps<a href="#Rollback-steps" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add back salt key with <code>ssh osd "sudo salt-key -y -a openqa-piworker.qa.suse.de"</code></li>
</ul>