openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-26T08:23:01ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #158026 (Resolved): osd-deployment exceeds 2h maximum runtime duri...https://progress.opensuse.org/issues/1580262024-03-26T08:23:01Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2426666" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/jobs/2426666</a></p>
<pre><code>+ retry -r 3 -- zypper --no-refresh -n dup --replacefiles
Loading repository data...
..Reading installed packages...
.Warning: You are about to do a distribution upgrade with all enabled repositories. Make sure these repositories are compatible before you continue. See 'man zypper' for more information about this command.
Computing distribution upgrade...
.
The following 7 packages are going to be upgraded:
openQA openQA-client openQA-common openQA-doc openQA-local-db system-user-velociraptor velociraptor
7 packages to upgrade.
Overall download size: 0 B. Already cached: 21.2 MiB. After the operation, additional 4.7 KiB will be used.
[...]
Checking for file conflicts: [..done]
(1/4) Installing: openQA-common-4.6.1711372491.18a87328-lp155.6447.1.ppc64le [...done]
(2/4) Installing: os-autoinst-distri-opensuse-deps-1.1711423505.d81d6831-lp155.14058.1.noarch [...done]
(3/4) Installing: openQA-client-4.6.1711372491.18a87328-lp155.6447.1.ppc64le [...done]
(4/4) Installing: openQA-worker-4.6.1711372491.18a87328-lp155.6447.1.ppc64le [....done]....................................................................................................................................................
[...]
............................................................................
ERROR: Job failed: execution took longer than 2h0m0s seconds
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Infrastructure - action #158023 (Resolved): salt-states-openqa pipeline invalid arguments ...https://progress.opensuse.org/issues/1580232024-03-26T08:18:33Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425817" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425817</a> and also <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2422794" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2422794</a></p>
<pre><code>monitor.qe.nue2.suse.org:
Passed invalid arguments to state.highstate: expected str, bytes or os.PathLike object, not list
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Infrastructure - action #158020 (Resolved): salt-states-openqa pipeline times outhttps://progress.opensuse.org/issues/1580202024-03-26T08:13:58Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2425611" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2425611</a></p>
<pre><code> ID: SUSE:SLE-15-SP6:Update:BCI
Function: cmd.run
Name: su geekotest -c 'mkdir -p SUSE:SLE-15-SP6:Update:BCI && python3 script/sctimeout: sending signal TERM to command 'ssh'
</code></pre>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425891" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2425891</a></p>
<pre><code> ID: stop_and_disable_all_not_configured_workers
Function: cmd.run
Name: services=$(systemctl list-units --all 'openqa-worker-auto-restart@*.service' | sed -e '/.*openqa-worker-auto-restart@.*\.service.*/!d' -e 's|.*openqa-worker-auto-restart@\(.*\)\.service.*|\1|' | awk '{ if($0 > 16) print "openqa-worker-auto-restart@" $0 ".service openqa-reload-worker-auto-restart@" $0 ".path" }' | tr '\n' ' '); [ -z "$services" ] || systemctl disable --ntimeout: sending signal TERM to command 'ssh'
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
QA - action #157858 (Resolved): Repeated reminder comments about SLO's for openqatests size:Shttps://progress.opensuse.org/issues/1578582024-03-25T08:37:52Zlivdywanliv.dywan@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: No ticket reminder comments about SLO's for openqatests size:M (Resolved)" href="https://progress.opensuse.org/issues/157522">#157522</a> addressed a bug that prevented reminder comments from being sent. Unfortunately comments are added even if a comment was already present. This is especially visible in <em>immediate</em> tickets, for example #153115, which get daily reminders - as per <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Automated alerts and reminders about SLO's for openqatests (only one reminder) size:M (Resolved)" href="https://progress.opensuse.org/issues/116545">#116545</a> only one comment is supposed to be added. Maybe this is a regression or the check is not comprehensive enough.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Reminders are only added once</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>We already have the code that should handle that: Review the implementation from <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Automated alerts and reminders about SLO's for openqatests (only one reminder) size:M (Resolved)" href="https://progress.opensuse.org/issues/116545">#116545</a> for gaps in the current logic in <a href="https://github.com/openSUSE/backlogger/blob/main/backlogger.py" class="external">https://github.com/openSUSE/backlogger/blob/main/backlogger.py</a></li>
<li>Investigate if something changed with current comments, maybe the Redmine upgrade made a difference here (complete guess)?</li>
<li>Maybe the regex needs to be adapted and/or better covered with unit testing</li>
</ul>
openQA Infrastructure - action #156460 (Resolved): Potential FS corruption on osd due to 2 VMs ac...https://progress.opensuse.org/issues/1564602024-03-01T13:51:21Zjbaier_czjbaier@suse.cz
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Users noticed slowness of osd in <a href="https://suse.slack.com/archives/C02CANHLANP/p1709297645213609" class="external">https://suse.slack.com/archives/C02CANHLANP/p1709297645213609</a>; openqa-monitor.qa.suse.de also show problem with availability. </p>
<p>Logs on osd shows potential problem with FS</p>
<pre><code>Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/26/4669e8a06e5502583ba67b138a9c30b97efbfff1f8af0b92f937ad8b70035d: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #467326: comm salt-master: deleted inode referenced: 467329
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #467326: comm salt-master: deleted inode referenced: 467329
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #428053: comm salt-master: deleted inode referenced: 428056
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #428053: comm salt-master: deleted inode referenced: 428056
Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/08/96cf9ed4cc58d8c044fe257e5e977516e49383070eea5680e3f8d53fc31712: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #358221: comm salt-master: deleted inode referenced: 358225
Mar 01 14:29:14 openqa kernel: EXT4-fs error (device vda1): ext4_lookup:1855: inode #358221: comm salt-master: deleted inode referenced: 358225
Mar 01 14:29:14 openqa salt-master[25856]: [ERROR ] Unable to remove /var/cache/salt/master/jobs/eb/8843afe01ce61b501612957cc3df3a3d8371a9c2694ebd800b47d514066853: [Errno 117] Structure needs cleaning: '.min>
Mar 01 14:29:14 openqa openqa-websockets-daemon[15372]: [debug] [pid:15372] Updating seen of worker 1951 from worker_status (free)
</code></pre>
<p>There might be a situation where two VMs were running with the same backing device according to <a href="https://suse.slack.com/archives/C02CANHLANP/p1709299401351479?thread_ts=1709297645.213609&cid=C02CANHLANP" class="external">https://suse.slack.com/archives/C02CANHLANP/p1709299401351479?thread_ts=1709297645.213609&cid=C02CANHLANP</a></p>
<p>The server was rebooted to get it to consistent state, but unfortunately due the FS corruption osd is currently in the maintenance mode and needs recovery.</p>
QA - action #156175 (Resolved): Support development of https://github.com/openSUSE/qem-bot/pull/1...https://progress.opensuse.org/issues/1561752024-02-27T19:07:49Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>QE Core and others would like to see improvements to qem-bot. So that we can enable them to develop new features themselves we should support. In this case <a href="https://github.com/openSUSE/qem-bot/pull/154" class="external">https://github.com/openSUSE/qem-bot/pull/154</a> shows a new concept. While the PR is not mergeable yet we can still try out the functionality. For this <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/65" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/65</a> already exists which we should merge, monitor and most likely revert after we have enough feedback which we should write back in <a href="https://github.com/openSUSE/qem-bot/pull/154" class="external">https://github.com/openSUSE/qem-bot/pull/154</a>.<br>
For more context see <a href="https://suse.slack.com/archives/C02CANHLANP/p1709051006649099" class="external">https://suse.slack.com/archives/C02CANHLANP/p1709051006649099</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> It is known if <a href="https://github.com/openSUSE/qem-bot/pull/154" class="external">https://github.com/openSUSE/qem-bot/pull/154</a> can generally work</li>
<li><strong>AC2:</strong> CI jobs in <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/</a> are green (again)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Merge <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/65" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/merge_requests/65</a> when you are able to also monitor the impact</li>
<li>Monitor and revert as necessary</li>
<li>Feed feedback back to <a href="https://github.com/openSUSE/qem-bot/pull/154" class="external">https://github.com/openSUSE/qem-bot/pull/154</a></li>
<li>Rinse and repeat</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>Have <a href="https://github.com/openSUSE/qem-bot/pull/154" class="external">https://github.com/openSUSE/qem-bot/pull/154</a> merged</li>
</ul>
openQA Infrastructure - action #155929 (Resolved): Try out rstp_enable=True in openqa/openvswitch...https://progress.opensuse.org/issues/1559292024-02-23T12:56:48Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>We have the theory that our multi-machine setup with GRE tunnels and STP cause problems like happened in <a class="issue tracker-4 status-3 priority-6 priority-high2 closed behind-schedule" title="action: [alert] openqa-worker-cacheservice fails to start on worker29.oqa.prg2.suse.org with "Database ha... (Resolved)" href="https://progress.opensuse.org/issues/155716#note-8">#155716-8</a> possibly due to STP being too slow to adapt causing openQA tests to fail.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Temporary multi-machine test issues are prevented when worker hosts temporarily are unavailable</li>
<li><strong>AC2:</strong> RSTP does not break more than we had in before</li>
<li><strong>AC3:</strong> Our documentation and salt states are up-to-date regarding STP+RSTP</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Read <a href="https://pve.proxmox.com/wiki/Open_vSwitch#Rapid_Spanning_Tree_.28RSTP.29" class="external">https://pve.proxmox.com/wiki/Open_vSwitch#Rapid_Spanning_Tree_.28RSTP.29</a> and enable the setting via Salt</li>
<li>Read <a href="https://www.accuenergy.com/support/reference-directory/rapid-spanning-tree-protocol-rstp/#:~:text=Rapid%20Spanning%20Tree%20Protocol%20(RSTP%3A%20IEEE%20802.1w)%20is,free%E2%80%9D%20topology%20within%20Ethernet%20networks" class="external">https://www.accuenergy.com/support/reference-directory/rapid-spanning-tree-protocol-rstp/#:~:text=Rapid%20Spanning%20Tree%20Protocol%20(RSTP%3A%20IEEE%20802.1w)%20is,free%E2%80%9D%20topology%20within%20Ethernet%20networks</a>.</li>
<li>Do a simple ping test between VMs (using a cluster of at least 3 machines connected via GRE) when one of the GRE nodes disconnects and connects (see <a href="http://open.qa/docs/#_start_test_vms_manually" class="external">http://open.qa/docs/#_start_test_vms_manually</a>)</li>
<li>Try via the MM openQA-in-openQA test by simply changing <a href="https://github.com/os-autoinst/os-autoinst/blob/master/script/os-autoinst-setup-multi-machine#L50" class="external">https://github.com/os-autoinst/os-autoinst/blob/master/script/os-autoinst-setup-multi-machine#L50</a> and adapting the openQA-in-openQA test to use that os-autoinst version instead of the stable package</li>
<li>Try to reproduce the test e.g. using <a href="https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-HA-Incidents&machine=64bit&test=qam_ha_hawk_haproxy_node02&version=15-SP2" class="external">https://openqa.suse.de/tests/latest?arch=x86_64&distri=sle&flavor=Server-DVD-HA-Incidents&machine=64bit&test=qam_ha_hawk_haproxy_node02&version=15-SP2</a> by running this test near-continuous and then trigger a reboot of a machine which "ovs-appctl stp/show" shows to be crucial for the connection while the test is running</li>
<li>Then enable rstp in the wicked hook scripts and possibly disable stp instead</li>
<li>Reconduct the experiment and check if the above significantly prevents related problems</li>
<li>If successful ensure that <a href="https://github.com/os-autoinst/os-autoinst/blob/master/script/os-autoinst-setup-multi-machine#L50" class="external">https://github.com/os-autoinst/os-autoinst/blob/master/script/os-autoinst-setup-multi-machine#L50</a> and salt-states are in sync and our config in <a href="http://open.qa/docs/" class="external">http://open.qa/docs/</a></li>
</ul>
openQA Infrastructure - action #155740 (Resolved): Scripts CI pipelines fail due to timeout after...https://progress.opensuse.org/issues/1557402024-02-21T11:58:03Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2298958" class="external">https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2298958</a></p>
<pre><code>{"count":2,"failed":[],"ids":[13560656,13560657],"scheduled_product_id":2058111}
2 jobs have been created:
- http://openqa.suse.de/tests/13560656
- http://openqa.suse.de/tests/13560657
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
Job state of job ID 13560656: scheduled, waiting …
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
[...]
Job state of job ID 13560656: scheduled, waiting …
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
Jo
ERROR: Job failed: execution took longer than 1h0m0s seconds
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
QA - action #155458 (Resolved): Seemingly reproducible build failures in devel:languages:perl per...https://progress.opensuse.org/issues/1554582024-02-14T08:28:43Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://build.opensuse.org/package/live_build_log/devel:languages:perl/perl-Mojo-IOLoop-ReadWriteProcess/openSUSE_Tumbleweed/x86_64" class="external">https://build.opensuse.org/package/live_build_log/devel:languages:perl/perl-Mojo-IOLoop-ReadWriteProcess/openSUSE_Tumbleweed/x86_64</a><br>
fails and I don't even see the error:</p>
<pre><code>…
[ 108s] t/12_mocked_container.t .. ok
[ 108s] t/13_shared.t ............ skipped: Skipped unless TEST_SHARED is set
[ 108s] All tests successful.
[ 108s] Files=15, Tests=69, 100 wallclock secs ( 0.08 usr 0.03 sys + 4.21 cusr 0.74 csys = 5.06 CPU)
[ 108s] Result: PASS
…
[ 108s] + RPM_EC=0
[ 108s] ++ jobs -p
[ 108s] + exit 0
[ 108s] Executing(%license): /usr/bin/bash -e /var/tmp/rpm-tmp.3qVKzU
[ 108s] + umask 022
[ 108s] + cd /home/abuild/rpmbuild/BUILD
[ 108s] + cd Mojo-IOLoop-ReadWriteProcess-0.34
[ 108s] + LICENSEDIR=/home/abuild/rpmbuild/BUILDROOT/perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-38.11.x86_64/usr/share/licenses/perl-Mojo-IOLoop-ReadWriteProcess
[ 108s] + export LC_ALL=
[ 108s] + LC_ALL=
[ 108s] + export LICENSEDIR
[ 108s] + /usr/bin/mkdir -p /home/abuild/rpmbuild/BUILDROOT/perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-38.11.x86_64/usr/share/licenses/perl-Mojo-IOLoop-ReadWriteProcess
[ 108s] + cp -pr /home/abuild/rpmbuild/BUILD/Mojo-IOLoop-ReadWriteProcess-0.34/LICENSE /home/abuild/rpmbuild/BUILDROOT/perl-Mojo-IOLoop-ReadWriteProcess-0.340.0-38.11.x86_64/usr/share/licenses/perl-Mojo-IOLoop-ReadWriteProcess
[ 108s] + RPM_EC=0
[ 108s] ++ jobs -p
[ 108s] + exit 0
[ 108s] Broken pipe
[ 108s] ### VM INTERACTION START ###
[ 108s] [ 105.200672][ T1] sysrq: Power Off
[ 108s] [ 105.201394][ T84] reboot: Power down
[ 108s] ### VM INTERACTION END ###
[ 108s]
[ 108s] i03-ch1c failed "build perl-Mojo-IOLoop-ReadWriteProcess.spec" at Wed Feb 14 08:02:04 UTC 2024.
</code></pre>
<p>EDIT: By now <a href="https://build.opensuse.org/package/live_build_log/devel:languages:perl/perl-Mojo-IOLoop-ReadWriteProcess/openSUSE_Tumbleweed/x86_64" class="external">https://build.opensuse.org/package/live_build_log/devel:languages:perl/perl-Mojo-IOLoop-ReadWriteProcess/openSUSE_Tumbleweed/x86_64</a> is back to "succeeded" not showing "Broken pipe". I suggest to actually just report to OBS upstream about the errors, ignorable warnings, etc.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1</strong>: The problem has been reported to upstream OBS (either already reported or new report)</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Lookup if there any existing issues about such confusing error messages or report a new one</li>
<li>Learn from OBS experts how to better handle such situation and share with the team</li>
</ul>
openQA Infrastructure - action #155074 (Resolved): salt-states-pipeline fails trying to install i...https://progress.opensuse.org/issues/1550742024-02-07T12:32:45Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2248721#L9182" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2248721#L9182</a></p>
<pre><code> ID: databases.repo
Function: pkg.latest
Name: influxdb
Result: False
Comment: An error was encountered while installing package(s): Zypper command failure: Loading repository data...
Reading installed packages...
Resolving package dependencies...
Problem: the to be installed influxdb-1.11.2-1.5.x86_64 requires 'group(influxdb)', but this requirement cannot be provided
not installable providers: influxdb2-2.7.1-1.34.x86_64[databases.repo]
Solution 1: do not install influxdb-1.11.2-1.5.x86_64
Solution 2: break influxdb-1.11.2-1.5.x86_64 by ignoring some of its dependencies
Choose from above solutions by number or cancel [1/2/c/d/?] (c): c
Started: 11:57:47.688754
Duration: 1638.476 ms
Changes:
</code></pre> openQA Infrastructure - action #154624 (Resolved): Periodically running simple ping-check multi-m...https://progress.opensuse.org/issues/1546242024-01-31T12:23:31Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In cases like <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [ppc64le] test fails in iscsi_client - zypper reports Error Message: Could not resolve host: open... (Resolved)" href="https://progress.opensuse.org/issues/154552">#154552</a> multi-machine issues (still) happen and while we monitor multi-machine test results there are cases where users notify us about problems that we don't see in our monitoring. Because we now (<a class="issue tracker-4 status-3 priority-4 priority-default closed behind-schedule" title="action: Ensure automated openQA tests verify that os-autoinst-setup-multi-machine sets up valid networkin... (Resolved)" href="https://progress.opensuse.org/issues/138302">#138302</a>) have a good simple ping-check multi-machine test scenario created by dheidler we can use that scenario similar to openQA-in-openQA tests running periodically very often and whenever that scenario fails - because it's so simple likely the cause is multi-machine infrastructure related problems we want to know about - then alert the tools team directly, e.g. email to Slack #team-qa-tools or something, using openqa-label-known-issues</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> simple ping-check multi-machine tests executed on x86_64 on OSD periodically covering multiple physical hosts</li>
<li><strong>AC2:</strong> The tools team is alerted directly if those tests fail</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Read <a class="issue tracker-4 status-3 priority-4 priority-default closed behind-schedule" title="action: Ensure automated openQA tests verify that os-autoinst-setup-multi-machine sets up valid networkin... (Resolved)" href="https://progress.opensuse.org/issues/138302">#138302</a> where dheidler added the simple ping-check for openQA-in-openQA tests</li>
<li>The scenario can be in a new job group or just groupless</li>
<li>Think about how to trigger periodically, possibly gitlab CI pipeline?</li>
<li>Similar to wicked tests</li>
<li>Ensure the alerting, possibly use <a href="https://github.com/os-autoinst/scripts/?tab=readme-ov-file#unreviewed-issues" class="external">https://github.com/os-autoinst/scripts/?tab=readme-ov-file#unreviewed-issues</a></li>
</ul>
QA - action #153799 (Resolved): Prepare DHCP/DNS for machines coming to qe.prg2.suse.org based on...https://progress.opensuse.org/issues/1537992024-01-17T20:04:43Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Many former QAM machines that are now in PRG2/PRG2e which are not yet in operation were formerly already managed in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a> for both DHCP as well as DNS. Now DHCP/DNS needs to be adapted for the machines that should live in qe.prg2.suse.org to be able to operate properly again.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Valid DHCP+DNS entries exist for all former QAM machines now residing in PRG2/PRG2e</li>
<li><strong>AC2:</strong> No more references left in the OPS-Service repo for decomissioned QAM machines</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Block on #153664</li>
<li>See <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Prepare DHCP/DNS for qe.prg2.suse.org based on former qa.suse.de entries size:M (Resolved)" href="https://progress.opensuse.org/issues/153796">#153796</a> for similar work for QE non-openQA machines</li>
<li>See how currently DHCP/DNS records are managed in examples like <a href="https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs" class="external">https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs</a></li>
<li>Go through all entries in <a href="https://gitlab.suse.de/OPS-Service/salt/-/tree/production/pillar/domain/qam_suse_de/hosts.yaml" class="external">https://gitlab.suse.de/OPS-Service/salt/-/tree/production/pillar/domain/qam_suse_de/hosts.yaml</a> and for each entry</li>
<li>Cross-check with racktables e.g. <a href="https://racktables.nue.suse.com/index.php?page=search&last_page=object&last_tab=default&q=whale" class="external">https://racktables.nue.suse.com/index.php?page=search&last_page=object&last_tab=default&q=whale</a> (put the name in the search)
<ul>
<li>Example of a valid machine <a href="https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=9594" class="external">https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=9594</a></li>
<li>Example of a machine that's gone <a href="https://racktables.nue.suse.com/index.php?page=ipaddress&ip=10.161.224.55" class="external">https://racktables.nue.suse.com/index.php?page=ipaddress&ip=10.161.224.55</a></li>
<li><em>IF</em> the machine still exists and should be in qe.prg2.suse.org (should be about 25 machines)</li>
<li>move and adapt the according entries to
<ul>
<li>pillar/domain/qe_prg2_suse_org/hosts.yaml</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org-rev-10.145.0</li>
</ul></li>
<li><em>ELSE</em> if the machine does not exist anymore</li>
<li>remove all references in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a></li>
</ul></li>
<li>After that remove pillar/domain/qam_suse_de/hosts.yaml as well as all not anymore used A-records in qam.suse.de as qam.suse.de should from then on only be used for CNAME entries</li>
</ul>
QA - action #153796 (Resolved): Prepare DHCP/DNS for qe.prg2.suse.org based on former qa.suse.de ...https://progress.opensuse.org/issues/1537962024-01-17T19:57:12Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Many machines that are now in PRG2/PRG2e which are not yet in operation were formerly managed in <a href="https://gitlab.suse.de/qa-sle/qanet-configs/" class="external">https://gitlab.suse.de/qa-sle/qanet-configs/</a> . The DNS config from qanet meanwhile is also provided by Eng-Infra maintained DNS servers managed in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a> but for many machines DHCP/DNS needs to be prepared for the machines that should live in qe.prg2.suse.org to be able to operate properly again.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Valid DHCP+DNS entries exist for all former QA non-openQA machines now residing in PRG2/PRG2e</li>
<li><strong>AC2</strong>: No more references left in the OPS-Service repo for decomissioned QA non-openQA machines</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>See how currently DHCP/DNS records are managed in examples like <a href="https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs" class="external">https://gitlab.suse.de/OPS-Service/salt/-/merge_requests/3839/diffs</a></li>
<li>See how <a href="https://gitlab.suse.de/qa-sle/qanet-configs/" class="external">https://gitlab.suse.de/qa-sle/qanet-configs/</a> is structured to be able to find DHCP/DNS entries for machines.</li>
<li>Go through all A-record entries of <a href="https://gitlab.suse.de/OPS-Service/salt/-/tree/production/salt/profile/dns/files/prg2_suse_org/dns-qa.suse.de.zone" class="external">https://gitlab.suse.de/OPS-Service/salt/-/tree/production/salt/profile/dns/files/prg2_suse_org/dns-qa.suse.de.zone</a> and for each entry
<ul>
<li><em>IF</em> the machine still exists (There should be about 10 physical machines, for some multiple entries, e.g. all "grenache" lpars)</li>
<li>create an according entry in
<ul>
<li>pillar/domain/qe_prg2_suse_org/hosts.yaml</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org</li>
<li>salt/profile/dns/files/prg2_suse_org/dns-qe.prg2.suse.org-rev-10.145.0</li>
</ul></li>
<li><em>ELSE</em> if the machine does not exist anymore</li>
<li>remove all references in <a href="https://gitlab.suse.de/OPS-Service/salt" class="external">https://gitlab.suse.de/OPS-Service/salt</a></li>
</ul></li>
</ul>
QA - action #103464 (Resolved): qa-tools-backlog-assistant: Extract code into a GitHub Action for...https://progress.opensuse.org/issues/1034642021-12-03T10:11:11Ztinitatina.mueller+trick-redmine@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>Currently there are two projects originating from the same one:</p>
<ul>
<li><a href="https://github.com/os-autoinst/qa-tools-backlog-assistant" class="external">https://github.com/os-autoinst/qa-tools-backlog-assistant</a></li>
<li><a href="https://github.com/BillAnastasiadis/qe-c-backlog-assistant" class="external">https://github.com/BillAnastasiadis/qe-c-backlog-assistant</a></li>
</ul>
<p>Originally the script contained the code and configuration.<br>
Now both projects have diverged because they are tracking different backlogs.<br>
Also both projects have been refactored so that the configuration is not directly in the code anymore.</p>
<p>Other projects wanting to use this assistant have to fork it, but because of local changes they can't easily</p>
<ul>
<li>contribute back code improvements</li>
<li>pull in upstream improvements</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Extract code into a GitHub Action, so that projects using it only have to configure it locally via a YAML file.</li>
<li><a href="https://docs.github.com/en/actions/creating-actions" class="external">https://docs.github.com/en/actions/creating-actions</a></li>
<li>Configuration <em>could</em> be done via the workflow file itself (via env vars), but that may include a lot of repetition</li>
<li>A YAML file read by the code directly might be better</li>
<li>The workflow configuration currently needs to define every queue separately, see <a href="https://github.com/BillAnastasiadis/qe-c-backlog-assistant/blob/master/.github/workflows/backlog_checker.yml#L30" class="external">https://github.com/BillAnastasiadis/qe-c-backlog-assistant/blob/master/.github/workflows/backlog_checker.yml#L30</a> ff. Better might be just one workflow step.</li>
</ul>
QA - action #95822 (Resolved): qa-maintenance/openQABot failed to trigger aggregate tests with "u...https://progress.opensuse.org/issues/958222021-07-22T07:03:59Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From <a href="https://chat.suse.de/channel/qem-openqa-review?msg=ifWGbs7QXfJdGJqTf" class="external">https://chat.suse.de/channel/qem-openqa-review?msg=ifWGbs7QXfJdGJqTf</a></p>
<p>From <code>ssh qam2 'journalctl -M openqabot -u openqabot-full --since=2021-07-22'</code>:</p>
<pre><code>Jul 22 01:09:50 openqabot oqaqambot[21718]: INFO: Updates shedule enabled for this run on PUBCLOUD12SP5AZUREStandardgen2:x86_64
Jul 22 01:09:50 openqabot oqaqambot[21718]: INFO: sle-12-SP5-x86_64 repohash: 4a870c348452ec6fb6c9ca52b30d9aea
Jul 22 01:09:50 openqabot oqaqambot[21718]: INFO: Incidents in sle-12-SP5-x86_64: {'ARCH': 'x86_64',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'BUILD': '20210722-1',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'DISTRI': 'sle',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'FLAVOR': 'AZURE-Standard-gen2-Updates',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'OS_TEST_ISSUES': '20175,20204,20222,20248,20258,20283,20344,20353,20354,20431,20434,20450,20475,20477,20485,4705',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'PUBLICCLOUD_TOOLS_IMAGE_QUERY': 'https://openqa.suse.de/group_overview/276.json',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'PUBLIC_CLOUD_AZURE_OFFER': 'sles-12-sp5',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'PUBLIC_CLOUD_AZURE_SKU': 'gen2',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'PUBLIC_CLOUD_IMAGE_ID': '',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'REPOHASH': '4a870c348452ec6fb6c9ca52b30d9aea',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'SDK_TEST_ISSUES': '20204,20222,20225,20248,20274,20283,20326,20344,20354,20434,20450,20475,20477',
Jul 22 01:09:50 openqabot oqaqambot[21718]: 'VERSION': '12-SP5',
Jul 22 01:09:50 openqabot oqaqambot[21718]: '_OBSOLETE': 1}
Jul 22 01:09:57 openqabot oqaqambot[21718]: WARNING: PUBCLOUD12SP5AZUREBasic is outdated: 20210721-1
Jul 22 01:09:57 openqabot oqaqambot[21718]: INFO: Updates shedule enabled for this run on PUBCLOUD12SP5AZUREBasic:x86_64
Jul 22 01:09:57 openqabot oqaqambot[21718]: INFO: sle-12-SP5-x86_64 repohash: 4a870c348452ec6fb6c9ca52b30d9aea
Jul 22 01:09:57 openqabot oqaqambot[21718]: INFO: Incidents in sle-12-SP5-x86_64: {'ARCH': 'x86_64',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'BUILD': '20210722-1',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'DISTRI': 'sle',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'FLAVOR': 'AZURE-Basic-Updates',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'OS_TEST_ISSUES': '20175,20204,20222,20248,20258,20283,20344,20353,20354,20431,20434,20450,20475,20477,20485,4705',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'PUBLICCLOUD_TOOLS_IMAGE_QUERY': 'https://openqa.suse.de/group_overview/276.json',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'PUBLIC_CLOUD_AZURE_OFFER': 'sles-12-sp5-basic',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'PUBLIC_CLOUD_AZURE_SKU': 'gen1',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'PUBLIC_CLOUD_IMAGE_ID': '',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'REPOHASH': '4a870c348452ec6fb6c9ca52b30d9aea',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'SDK_TEST_ISSUES': '20204,20222,20225,20248,20274,20283,20326,20344,20354,20434,20450,20475,20477',
Jul 22 01:09:57 openqabot oqaqambot[21718]: 'VERSION': '12-SP5',
Jul 22 01:09:57 openqabot oqaqambot[21718]: '_OBSOLETE': 1}
Jul 22 01:10:02 openqabot oqaqambot[21718]: Traceback (most recent call last):
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/bin/oqaqambot", line 11, in <module>
Jul 22 01:10:02 openqabot oqaqambot[21718]: load_entry_point('openQABot==0.3.0', 'console_scripts', 'oqaqambot')()
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/openqabot/main.py", line 18, in main
Jul 22 01:10:02 openqabot oqaqambot[21718]: sys.exit(run_bot(logger, args, sys))
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/openqabot/main.py", line 41, in run_bot
Jul 22 01:10:02 openqabot oqaqambot[21718]: return OpenQABot(metadata, args)()
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/openqabot/openqabot.py", line 110, in __call__
Jul 22 01:10:02 openqabot oqaqambot[21718]: self.calculate_updates()
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/openqabot/openqabot.py", line 142, in calculate_updates
Jul 22 01:10:02 openqabot oqaqambot[21718]: incidents = updates.gather_incidents(self.apiurl, arch)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/openqabot/update/updates.py", line 109, in gather_incidents
Jul 22 01:10:02 openqabot oqaqambot[21718]: req = self.is_incident_in_testing(apiurl, incident)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/openqabot/update/updates.py", line 59, in is_incident_in_testing
Jul 22 01:10:02 openqabot oqaqambot[21718]: res = osc.core.search(apiurl, request=xpath)["request"]
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/osc/core.py", line 6819, in search
Jul 22 01:10:02 openqabot oqaqambot[21718]: f = http_GET(u)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/osc/core.py", line 3421, in http_GET
Jul 22 01:10:02 openqabot oqaqambot[21718]: def http_GET(*args, **kwargs): return http_request('GET', *args, **kwargs)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib/python3.6/site-packages/osc/core.py", line 3410, in http_request
Jul 22 01:10:02 openqabot oqaqambot[21718]: fd = urlopen(req, data=data)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib64/python3.6/urllib/request.py", line 223, in urlopen
Jul 22 01:10:02 openqabot oqaqambot[21718]: return opener.open(url, data, timeout)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib64/python3.6/urllib/request.py", line 532, in open
Jul 22 01:10:02 openqabot oqaqambot[21718]: response = meth(req, response)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib64/python3.6/urllib/request.py", line 642, in http_response
Jul 22 01:10:02 openqabot oqaqambot[21718]: 'http', request, response, code, msg, hdrs)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib64/python3.6/urllib/request.py", line 570, in error
Jul 22 01:10:02 openqabot oqaqambot[21718]: return self._call_chain(*args)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib64/python3.6/urllib/request.py", line 504, in _call_chain
Jul 22 01:10:02 openqabot oqaqambot[21718]: result = func(*args)
Jul 22 01:10:02 openqabot oqaqambot[21718]: File "/usr/lib64/python3.6/urllib/request.py", line 650, in http_error_default
Jul 22 01:10:02 openqabot oqaqambot[21718]: raise HTTPError(req.full_url, code, msg, hdrs, fp)
Jul 22 01:10:02 openqabot oqaqambot[21718]: urllib.error.HTTPError: HTTP Error 500: Internal Server Error
Jul 22 01:10:03 openqabot systemd[1]: openqabot-full.service: Main process exited, code=exited, status=1/FAILURE
Jul 22 01:10:03 openqabot systemd[1]: Failed to start Schedule and review Maintenance incidents in openQA full run.
</code></pre>
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<p>Could be a temporary performance problem on openqa.suse.de. In any case a retry should be conducted.</p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Logs on openqa.suse.de should be checked for the same timestamp (beware of timezone differences)</li>
<li>Implement retry, potentially even possible on systemd service level but that could cause lots of redundant jobs triggered if there is just a minor failure <em>after</em> many jobs had been triggered</li>
</ul>
<a name="Workaround"></a>
<h2 >Workaround<a href="#Workaround" class="wiki-anchor">¶</a></h2>
<p>On qam2: Trigger <code>systemctl -M openqabot start openqabot-full</code> manually. Caution: Takes multiple minutes, better do that in a screen session and monitor <code>journalctl -M openqabot -u openqabot-full -f</code></p>