openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-12-21T16:45:31ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #152857 (Resolved): [tools] alert ping between hosts timeout prox...https://progress.opensuse.org/issues/1528572023-12-21T16:45:31Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/EML0bpuGk/monitoring?viewPanel=4&orgId=1</a></p>
<p>looks like proxy.scc.de is down ..</p>
<p><a href="https://suse.slack.com/archives/C029APBKLGK/p1703170652751919" class="external">https://suse.slack.com/archives/C029APBKLGK/p1703170652751919</a></p>
<p>Q: who is responsible for proxy.scc.suse.de and where is running ?</p>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Remove silence "alertname=Packet loss between worker hosts and other hosts alert" from <a href="https://monitor.qa.suse.de/alerting/silences" class="external">https://monitor.qa.suse.de/alerting/silences</a></li>
</ul>
openQA Infrastructure - action #152827 (Resolved): [tools] cron service updating clamav database ...https://progress.opensuse.org/issues/1528272023-12-21T09:17:42Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From /var/spool/mail/cvdupdate on both instances of the osd and o3 web UI:</p>
<pre><code>From cvdupdate@localhost Thu Dec 21 10:00:01 2023
Return-Path: <cvdupdate@localhost>
X-Original-To: cvdupdate
Delivered-To: cvdupdate@localhost
Received: by localhost (Postfix, from userid 17307)
id BC86134590; Thu, 21 Dec 2023 10:00:01 +0100 (CET)
From: "(Cron Daemon)" <cvdupdate@localhost>
To: cvdupdate@localhost
Subject: Cron <cvdupdate@openqa> /home/cvdupdate/.local/bin/cvdupdate update
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Auto-Submitted: auto-generated
Precedence: bulk
X-Cron-Env: <XDG_SESSION_ID=c21238>
X-Cron-Env: <XDG_RUNTIME_DIR=/run/user/17307>
X-Cron-Env: <DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/17307/bus>
X-Cron-Env: <XDG_SESSION_TYPE=unspecified>
X-Cron-Env: <XDG_SESSION_CLASS=background>
X-Cron-Env: <LANG=en_US.UTF-8>
X-Cron-Env: <SHELL=/bin/sh>
X-Cron-Env: <HOME=/home/cvdupdate>
X-Cron-Env: <PATH=/usr/bin:/bin>
X-Cron-Env: <LOGNAME=cvdupdate>
X-Cron-Env: <USER=cvdupdate>
Message-Id: <20231221090001.BC86134590@localhost>
Date: Thu, 21 Dec 2023 10:00:01 +0100 (CET)
Traceback (most recent call last):
File "/home/cvdupdate/.local/bin/cvdupdate", line 11, in <module>
sys.exit(cli())
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 259, in update_alias
ctx.forward(db_update)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 628, in forward
return self.invoke(cmd, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/__main__.py", line 100, in db_update
m = CVDUpdate(config=config, verbose=verbose)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 119, in __init__
nameserver)
File "/home/cvdupdate/.local/lib/python3.6/site-packages/cvdupdate/cvdupdate.py", line 184, in _read_config
self.config = json.load(config_file)
File "/usr/lib64/python3.6/json/__init__.py", line 299, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Clamav is no longer raising exceptions on o3</li>
<li><strong>AC2:</strong> Clamav is no longer raising exceptions on osd</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Remove clamav service - install on o3, remove from salt on osd</li>
</ul>
openQA Infrastructure - action #152741 (Resolved): [tools] gitlab CI - openqa_review failed with ...https://progress.opensuse.org/issues/1527412023-12-18T15:46:57Zosukup
<p>Looks like osd wasn't able to reply to api longer than 30 sec, can be a random network problem or too complicated query? </p>
<p><a href="https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520" class="external">https://gitlab.suse.de/openqa/openqa-review/-/jobs/2077520</a></p>
<pre><code>usr/bin/openqa-review --host https://openqa.suse.de -n -r -T --query-issue-status --no-empty-sections --include-softfails --running-threshold=2 --exclude-job-groups '^(Released|Development|old|EOL)' --reminder-comment-on-issues --save --save-dir /tmp/tmp.1LmmaKoNx7 --job-groups '^SLE.*15.*(Functional)'
..............................WARNING:urllib3.connectionpool:Retrying (Retry(total=6, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..................................WARNING:urllib3.connectionpool:Retrying (Retry(total=5, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
......................................WARNING:urllib3.connectionpool:Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=3, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=1, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
...................................................................................................................................................................WARNING:urllib3.connectionpool:Retrying (Retry(total=0, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)")': /api/v1/parent_groups
..............................WARNING:openqa_review.browser:Request to https://openqa.suse.de/api/v1/parent_groups was not successful after 7 retries: HTTPSConnectionPool(host='openqa.suse.de', port=443): Max retries exceeded with url: /api/v1/parent_groups (Caused by ReadTimeoutError("HTTPSConnectionPool(host='openqa.suse.de', port=443): Read timed out. (read timeout=30)"))
Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 467, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3.11/site-packages/urllib3/connectionpool.py", line 1096, in _validate_conn
conn.connect()
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 642, in connect
sock_and_verified = _ssl_wrap_socket_and_match_hostname(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/connection.py", line 782, in _ssl_wrap_socket_and_match_hostname
ssl_sock = ssl_wrap_socket(
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 470, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(sock, context, tls_in_tls, server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/urllib3/util/ssl_.py", line 514, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 517, in wrap_socket
return self.sslsocket_class._create(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/ssl.py", line 1108, in _create
self.do_handshake()
File "/usr/lib64/python3.11/ssl.py", line 1379, in do_handshake
self._sslobj.do_handshake()
TimeoutError: _ssl.c:989: The handshake operation timed out
The above exception was the direct cause of the following exception:
</code></pre> openQA Project - action #135407 (Resolved): [tools] Measure to mitigate websockets overload by wo...https://progress.opensuse.org/issues/1354072023-09-08T11:39:06Zosukup
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Consolidate all steps we took to mitigate <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic] OSD openQA refuses to assign jobs, >3k scheduled not being picked up, no alert (Resolved)" href="https://progress.opensuse.org/issues/135122">#135122</a> and how to revert it.</p>
<p>1) stopped workers:</p>
<p>used:<br>
<code>sudo salt 'worker3[1,2,3,4,5,6]*' cmd.run 'sudo systemctl disable --now telegraf $(systemctl list-units | grep openqa-worker-auto-restart | cut -d "." -f 1 | xargs)'\<br>
&& for i in {1..6}; do sudo salt-key -y -d "worker3$i*"; done</code></p>
<p>revert:<br>
<code>for i in {1..6}; do sudo salt-key -y -a "worker3$i*";done && sudo salt 'worker3[1,2,3,4,5,6]*' state.apply</code></p>
<p>2) Lowered amount workers</p>
<p>used:<br>
<a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/606" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/606</a></p>
<p>revert: <br>
revert mentioned MR in GitLab</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Ensure step #1 has been reverted</li>
<li><strong>AC2</strong>: DONE Ensure step #2 has been reverted</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Maybe don't bring them all back at once (and be prepared to remove them again in case of new performance issues)</li>
<li>In case of new performance issues make sure to strace the openqa-scheduler and openqa-websockets processes</li>
</ul>
openQA Infrastructure - action #135335 (Resolved): [tools] gitlabci salt-pillars-openqa deploy f...https://progress.opensuse.org/issues/1353352023-09-07T08:04:46Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1810907</a></p>
<p>from log:</p>
<pre><code> ID: wicked ifup br1
Function: cmd.run
Result: False
Comment: Command "wicked ifup br1" run
Started: 07:07:10.803006
Duration: 30119.464 ms
Changes:
----------
pid:
16955
retcode:
157
stderr:
stdout:
br1 no-device
Name: /etc/sysconfig/network/ifcfg-tap0 - Function: file.managed - Result: Clean Started: - 07:07:40.934647 Duration: 7.021 ms
Name: /etc/sysconfig/network/ifcfg-tap64 - Function: file.managed - Result: Clean Started: - 07:07:40.945434 Duration: 5.152 ms
Name: /etc/sysconfig/network/ifcfg-tap128 - Function: file.managed - Result: Clean Started: - 07:07:40.954239 Duration: 5.0 ms
Name: /etc/sysconfig/network/ifcfg-tap1 - Function: file.managed - Result: Clean Started: - 07:07:40.962885 Duration: 5.013 ms
Name: /etc/sysconfig/network/ifcfg-tap65 - Function: file.managed - Result: Clean Started: - 07:07:40.971581 Duration: 5.002 ms
Name: /etc/sysconfig/network/ifcfg-tap129 - Function: file.managed - Result: Clean Started: - 07:07:40.980234 Duration: 4.984 ms
Name: /etc/sysconfig/network/ifcfg-tap2 - Function: file.managed - Result: Clean Started: - 07:07:40.988915 Duration: 5.037 ms
Name: /etc/sysconfig/network/ifcfg-tap66 - Function: file.managed - Result: Clean Started: - 07:07:40.997624 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap130 - Function: file.managed - Result: Clean Started: - 07:07:41.006362 Duration: 4.946 ms
Name: /etc/sysconfig/network/ifcfg-tap3 - Function: file.managed - Result: Clean Started: - 07:07:41.015198 Duration: 5.266 ms
Name: /etc/sysconfig/network/ifcfg-tap67 - Function: file.managed - Result: Clean Started: - 07:07:41.024299 Duration: 5.178 ms
Name: /etc/sysconfig/network/ifcfg-tap131 - Function: file.managed - Result: Clean Started: - 07:07:41.033154 Duration: 4.992 ms
Name: /etc/sysconfig/network/ifcfg-tap4 - Function: file.managed - Result: Clean Started: - 07:07:41.041806 Duration: 4.955 ms
Name: /etc/sysconfig/network/ifcfg-tap68 - Function: file.managed - Result: Clean Started: - 07:07:41.050532 Duration: 5.287 ms
Name: /etc/sysconfig/network/ifcfg-tap132 - Function: file.managed - Result: Clean Started: - 07:07:41.059443 Duration: 4.926 ms
Name: /etc/sysconfig/network/ifcfg-tap5 - Function: file.managed - Result: Clean Started: - 07:07:41.068081 Duration: 4.993 ms
Name: /etc/sysconfig/network/ifcfg-tap69 - Function: file.managed - Result: Clean Started: - 07:07:41.076758 Duration: 4.93 ms
Name: /etc/sysconfig/network/ifcfg-tap133 - Function: file.managed - Result: Clean Started: - 07:07:41.085353 Duration: 4.942 ms
Name: /etc/sysconfig/network/ifcfg-tap6 - Function: file.managed - Result: Clean Started: - 07:07:41.093943 Duration: 5.056 ms
Name: /etc/sysconfig/network/ifcfg-tap70 - Function: file.managed - Result: Clean Started: - 07:07:41.102645 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap134 - Function: file.managed - Result: Clean Started: - 07:07:41.111287 Duration: 4.987 ms
Name: /etc/sysconfig/network/ifcfg-tap7 - Function: file.managed - Result: Clean Started: - 07:07:41.119942 Duration: 4.913 ms
Name: /etc/sysconfig/network/ifcfg-tap71 - Function: file.managed - Result: Clean Started: - 07:07:41.128614 Duration: 4.959 ms
Name: /etc/sysconfig/network/ifcfg-tap135 - Function: file.managed - Result: Clean Started: - 07:07:41.137410 Duration: 4.953 ms
Name: /etc/sysconfig/network/ifcfg-tap8 - Function: file.managed - Result: Clean Started: - 07:07:41.146176 Duration: 4.935 ms
Name: /etc/sysconfig/network/ifcfg-tap72 - Function: file.managed - Result: Clean Started: - 07:07:41.154807 Duration: 5.035 ms
Name: /etc/sysconfig/network/ifcfg-tap136 - Function: file.managed - Result: Clean Started: - 07:07:41.163660 Duration: 4.937 ms
Name: /etc/sysconfig/network/ifcfg-tap9 - Function: file.managed - Result: Clean Started: - 07:07:41.172266 Duration: 4.954 ms
Name: /etc/sysconfig/network/ifcfg-tap73 - Function: file.managed - Result: Clean Started: - 07:07:41.181001 Duration: 4.95 ms
Name: /etc/sysconfig/network/ifcfg-tap137 - Function: file.managed - Result: Clean Started: - 07:07:41.189605 Duration: 5.503 ms
</code></pre>
<a name="Acceptance-Criteria"></a>
<h2 >Acceptance Criteria<a href="#Acceptance-Criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: Salt states apply successfully on imageworker</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Investigate if one of the service defs is missing a "requires" or similar</li>
<li>Commands were re-run - consider persistent mitigations if this is causing other pipelines to fail</li>
<li>This seems to affect imagetester, openqaworker17.qa.suse.cz, openqaworker16.qa.suse.cz and openqaworker18.qa.suse.cz so far</li>
</ul>
openQA Project - action #135134 (Resolved): [tools] GitlabCI salt-pillars-openqa deploy failed o...https://progress.opensuse.org/issues/1351342023-09-04T09:48:02Zosukup
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1803184" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1803184</a></p>
<pre><code>baremetal-support.qa.suse.de:
Data failed to compile:
----------
Rendering SLS 'base:debug_poo133469' failed: mapping values are not allowed here; line 13
---
[...]
attempts: 5
minion_cmd_file:
file.patch:
- name: warning: waiting for shared lock on /usr/lib/sysimage/rpm/Packages <======================
error: cannot get shared lock on /usr/lib/sysimage/rpm/Packages
error: cannot open Packages index using db4 - Operation not permitted (1)
error: cannot open Packages database in /usr/lib/sysimage/rpm
warning: waiting for shared lock on /usr/lib/sysimage/rpm/Packages
error: cannot get shared lock on /usr/lib/sysimage/rpm/Packages
[...]
---
section_end:1693819642:step_script
~~~
</code></pre> openQA Infrastructure - action #134816 (Resolved): [tools] grafana dashboard for `OpenQA Jobs tes...https://progress.opensuse.org/issues/1348162023-08-30T08:46:38Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Dashboard <a href="https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1" class="external">https://stats.openqa-monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1</a></p>
<p>missing data in graphs showing running tests from yesterday migration</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No missing data for osd on Grafana</li>
<li><strong>AC2:</strong> Alerts related to affected panels are functioning</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>In salt states in monitoring/telegraf/telegraf-webui.conf instead of <code>grains['fqdn']</code> use something like grains.get('primary_webui_domain', grains.get('fqdn'))`. Alternatively we could use the "id" in place of the FQDN</li>
<li>If the above does not work then use an OR expression since we already have data with different domains in the db (or implement that to cover the data from 2023-08-29 to today)</li>
<li>Also check whether alerts need to be covered</li>
<li>As alternative can we change the FQDN of osd to again point to openqa.suse.de
<ul>
<li>Apparently a bad idea according to mcaj (not sure why)</li>
</ul></li>
<li>See existing MR: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/953</a></li>
</ul>
openQA Infrastructure - action #125132 (Resolved): [alert] logrotate failed on OSDhttps://progress.opensuse.org/issues/1251322023-02-28T09:54:59Zosukup
<p>from journalctl:</p>
<pre><code>Feb 15 00:00:07 openqa logrotate[12569]: logrotate does not support parallel execution on the same set of logfiles.
Feb 15 00:00:07 openqa logrotate[12569]: error: state file /var/lib/misc/logrotate.status is already locked
Feb 15 00:00:00 openqa systemd[1]: Starting Rotate log files...
</code></pre> QA - action #123367 (Resolved): [tools] drop old testplatform code from teregen, repose and produ...https://progress.opensuse.org/issues/1233672023-01-19T10:08:27Zosukup
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>We finally have all refhosts in metadata described in the new format so we can safely drop old testplatform.</p>
<p>current situation: Teregen generates testplatform in two different formats based on two different product definitions which leads to data duplication and confuse users which data file need to be updated for correctly working template generation. Repose can generate two different yaml formats, it was changed to defaults in new but user still can post old format, so removing this possibility will reduce user related errors when maintaining refhosts files.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No old testplatform generated by teregen</li>
<li><strong>AC2:</strong> productsdef.pm file dropped from metadata</li>
<li><strong>AC3:</strong> old yaml format removed from repose</li>
</ul>
<a name="Links"></a>
<h2 >Links<a href="#Links" class="wiki-anchor">¶</a></h2>
<ul>
<li><a href="https://gitlab.suse.de/qa-maintenance/teregen" class="external">https://gitlab.suse.de/qa-maintenance/teregen</a></li>
<li><a href="https://gitlab.suse.de/qa-maintenance/metadata/-/blob/master/productdefs.pm" class="external">https://gitlab.suse.de/qa-maintenance/metadata/-/blob/master/productdefs.pm</a></li>
<li><a href="https://github.com/openSUSE/repose" class="external">https://github.com/openSUSE/repose</a></li>
<li><a href="https://confluence.suse.com/display/maintenanceqa/Repose+Usage" class="external">https://confluence.suse.com/display/maintenanceqa/Repose+Usage</a></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Disable relevant module in teregen template generator</li>
<li>Drop <code>--yaml-old</code> option in repose (keep <code>--yaml-ng</code>)</li>
</ul>
openQA Infrastructure - action #114908 (Resolved): [tools] https://stats.openqa-monitor.qa.suse.d...https://progress.opensuse.org/issues/1149082022-08-02T12:17:54Zosukup
<p>grafana overview page isn't responding .</p>
QA - action #113087 (Resolved): [qa-tools][qem-bot] malformed data in smelt incident causes smelt...https://progress.opensuse.org/issues/1130872022-06-27T16:51:08Zosukup
<p><a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1032830" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1032830</a></p>
<pre><code>ERROR: Expecting value: line 1 column 1 (char 0)
8889Traceback (most recent call last):
8890 File "/builds/qa-maintenance/bot-ng/qem-bot/openqabot/loader/smelt.py", line 64, in get_incident
8891 inc_result = requests.get(SMELT, params={"query": query}, verify=False).json()
8892 File "/usr/lib/python3.6/site-packages/requests/models.py", line 898, in json
8893 return complexjson.loads(self.text, **kwargs)
8894 File "/usr/lib64/python3.6/site-packages/simplejson/__init__.py", line 525, in loads
8895 return _default_decoder.decode(s)
8896 File "/usr/lib64/python3.6/site-packages/simplejson/decoder.py", line 370, in decode
8897 obj, end = self.raw_decode(s)
8898 File "/usr/lib64/python3.6/site-packages/simplejson/decoder.py", line 400, in raw_decode
8899 return self.scan_once(s, idx=_w(s, idx).end())
8900simplejson.errors.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
</code></pre> QA - action #112430 (Resolved): [qa-tools] [qem-bot] Incident schedule fails in preparation Incid...https://progress.opensuse.org/issues/1124302022-06-14T17:10:56Zosukup
<p>from log:</p>
<pre><code class="python syntaxhl" data-language="python"><span class="n">INFO</span><span class="p">:</span> <span class="mi">2022</span><span class="o">-</span><span class="mi">06</span><span class="o">-</span><span class="mi">14</span> <span class="mi">17</span><span class="p">:</span><span class="mi">03</span><span class="p">:</span><span class="mf">40.405480</span><span class="p">:</span> <span class="n">http</span><span class="p">:</span><span class="o">//</span><span class="n">download</span><span class="p">.</span><span class="n">suse</span><span class="p">.</span><span class="n">de</span><span class="o">/</span><span class="n">ibs</span><span class="o">/</span><span class="n">SUSE</span><span class="p">:</span><span class="o">/</span><span class="n">Maintenance</span><span class="p">:</span><span class="o">/</span><span class="mi">24419</span><span class="o">/</span><span class="n">SUSE_Updates_SLE</span><span class="o">-</span><span class="n">WE_12</span><span class="o">-</span><span class="n">SP5_x86_64</span><span class="o">/</span><span class="n">repodata</span><span class="o">/</span><span class="n">repomd</span><span class="p">.</span><span class="n">xml</span> <span class="ow">not</span> <span class="n">found</span> <span class="o">--</span> <span class="n">skip</span> <span class="n">incident</span>
<span class="n">INFO</span><span class="p">:</span> <span class="n">Project</span> <span class="n">SUSE</span><span class="p">:</span><span class="n">Maintenance</span><span class="p">:</span><span class="mi">24419</span> <span class="n">can</span><span class="sh">'</span><span class="s">t calculate repohash .. skipping
INFO: Project SUSE:Maintenance:24462 has empty channels - check incident in SMELT
Traceback (most recent call last):
File </span><span class="sh">"</span><span class="s">./qem-bot/bot-ng.py</span><span class="sh">"</span><span class="s">, line 7, in <module>
main()
File </span><span class="sh">"</span><span class="s">/builds/qa-maintenance/bot-ng/qem-bot/openqabot/main.py</span><span class="sh">"</span><span class="s">, line 41, in main
sys.exit(cfg.func(cfg))
File </span><span class="sh">"</span><span class="s">/builds/qa-maintenance/bot-ng/qem-bot/openqabot/args.py</span><span class="sh">"</span><span class="s">, line 24, in do_incident_schedule
bot = OpenQABot(args)
File </span><span class="sh">"</span><span class="s">/builds/qa-maintenance/bot-ng/qem-bot/openqabot/openqabot.py</span><span class="sh">"</span><span class="s">, line 23, in __init__
self.incidents = get_incidents(self.token)
File </span><span class="sh">"</span><span class="s">/builds/qa-maintenance/bot-ng/qem-bot/openqabot/loader/qem.py</span><span class="sh">"</span><span class="s">, line 41, in get_incidents
xs.append(Incident(i))
File </span><span class="sh">"</span><span class="s">/builds/qa-maintenance/bot-ng/qem-bot/openqabot/types/incident.py</span><span class="sh">"</span><span class="s">, line 27, in __init__
for r in incident[</span><span class="sh">"</span><span class="s">channels</span><span class="sh">"</span><span class="s">]
File </span><span class="sh">"</span><span class="s">/builds/qa-maintenance/bot-ng/qem-bot/openqabot/types/incident.py</span><span class="sh">"</span><span class="s">, line 25, in <listcomp>
for p, v, a in (
ValueError: not enough values to unpack (expected 3, got 2)
</span></code></pre> QA - action #111710 (Resolved): [qa-tools] [tools] remove usage of *_TEST_TEMPLATE vars in qem-b...https://progress.opensuse.org/issues/1117102022-05-27T14:23:36Zosukup
<a name="Current-state"></a>
<h2 >Current state:<a href="#Current-state" class="wiki-anchor">¶</a></h2>
<p>Now QEM uses in aggregate jobs pretty complicated system to add repositories under test to jobs itself:</p>
<p><em>qem-bot</em> post jobs with <code>*_TEST_ISUES</code> variable.<br>
In <em>osd</em> are defined media with templates in corresponding <code>*_TEST_TEMPLATE</code></p>
<p>when is job started from this vars is calculated variable <code>MAINT_TEST_REPO</code> using <code>SCC_ADDONS</code></p>
<p>used code for this transformation:</p>
<p><code>main_common.pm</code> -> <a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/main_common.pm#L773-L789" class="external">https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/lib/main_common.pm#L773-L789</a><br>
SLE <code>main.pm</code> -> <a href="https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/products/sle/main.pm#L323-L373" class="external">https://github.com/os-autoinst/os-autoinst-distri-opensuse/blob/master/products/sle/main.pm#L323-L373</a></p>
<a name="Cons"></a>
<h3 >Cons:<a href="#Cons" class="wiki-anchor">¶</a></h3>
<ul>
<li>pretty complex settings for medium with high probability of error</li>
<li>code used for this in os-autoinst-distri-opensuse is pretty complex</li>
</ul>
<a name="Proposal"></a>
<h2 >Proposal:<a href="#Proposal" class="wiki-anchor">¶</a></h2>
<p>modify <em>qem-bot</em> to post new vars (<code>*_TEST_REPO</code>) with corresponding repositories and the use simpler code to construct <code>MAINT_TEST_REPO</code> by simply joining this vars still using SCC_ADDONS to join correct products/modules</p>
<p>qem-bot has all needed info for generate this vars which removes one layer of possible human errors (still can be problem in qa-metadata, but it is simpler to spot and debug)</p>
<a name="Pros"></a>
<h3 >Pro's:<a href="#Pros" class="wiki-anchor">¶</a></h3>
<ul>
<li>remove mess from media settings</li>
<li>cleaner/simpler code in os-autoinst-distri-opensuse</li>
</ul>
<a name="Cons-2"></a>
<h3 >Cons:<a href="#Cons-2" class="wiki-anchor">¶</a></h3>
<ul>
<li>lights more complex code in qem-bot ( Aggregate class )</li>
<li>longer command line presented by qem-bot for aggregates</li>
</ul>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<p><strong>AC1</strong>: implement needed changes in qem-bot<br>
<strong>AC2</strong>: implement needed changes in os-autoinst-distri-opensuse<br>
<strong>AC3</strong>: (optional) cleanup media definitions in OSD</p>
<a name="O3"></a>
<h2 >O3<a href="#O3" class="wiki-anchor">¶</a></h2>
<p>Note: this changes only handling for OSD, O3 uses different <code>bot</code> so poo is not related ( but can be also implemented ? on O3 side)</p>
openQA Project - action #107497 (Resolved): [qe-tools] openqaworker14 (and openqa15) - developer...https://progress.opensuse.org/issues/1074972022-02-24T08:24:28Zosukup
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>in live view developer mode fails and reports fail every 2 sec</p>
<p>from browser javascript console:</p>
<pre><code>Establishing ws connection to wss://openqa.suse.de/liveviewhandler/tests/8210859/developer/ws-proxy/status
Received message via ws proxy: {"data":null,"type":"info","what":"connecting to os-autoinst command server at ws:\/\/10.100.96.68:20043\/OX0bG6w17OkxmWt_\/ws"}
Received message via ws proxy: {"data":null,"type":"error","what":"unable to upgrade ws to command server"}
Error from ws proxy: unable to upgrade ws to command server
Connection to livehandler lost
</code></pre>
<p>the connection between osd and workers works without problems and IP address of worker is also correct</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Developer mode on openqaworker14.qa.suse.cz works reliably as part of the productive OSD infrastructure</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a href="https://open.qa/docs/#debugdevelmode" class="external">https://open.qa/docs/#debugdevelmode</a></li>
<li><em>not</em> likely related to known issues with network performance</li>
</ul>
QA - action #96752 (Resolved): 'openSUSE-SLE' product schedule jobs to OSDhttps://progress.opensuse.org/issues/967522021-08-11T12:35:39Zosukup
<p>openSUSE-SLE has different structure than other products of SUSE so it needs some changes in sheduling bot.</p>
<ul>
<li>AC1: bot shedules openSUSE-SLE:15.3 jobs in osd </li>
</ul>