openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-28T19:30:27ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #158242 (New): Prevent ssh access to test VMs on svirt hypervisor ...https://progress.opensuse.org/issues/1582422024-03-28T19:30:27Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In <a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-150437" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-150437</a> we are asked to handle "compromised root passwords in QA segments" including s390zl11…16</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> firewall on OSD svirt hosts prevents direct ssh+vnc access from outside, i.e. normal office networks</li>
<li><strong>AC2:</strong> openQA svirt jobs are still able to access ssh+vnc as necessary, e.g. from openQA workers in the same network OR openQA workers on the hypervisor hosts themselves</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Take openQA svirt worker instances related to one hypervisor host, e.g. s390zl12, out of production for testing</li>
<li>Configure a/the firewall on that host to block ssh+vnc to VMs running on that host</li>
<li>Allow traffic from other hosts in oqa.prg2.suse.org</li>
<li>Ensure that openQA tests still work</li>
<li>Ensure that the according firewall config is made boot-persistent and in salt</li>
<li>Crosscheck with at least one reboot</li>
<li>Apply the same solution to all other OSD svirt hosts</li>
</ul>
openQA Project - action #158236 (New): Backlog Limits Checker github workflow fails on pull reque...https://progress.opensuse.org/issues/1582362024-03-28T17:10:31Ztinitatina.mueller+trick-redmine@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://github.com/openSUSE/backlogger/actions/runs/8468254805/job/23200822772" class="external">https://github.com/openSUSE/backlogger/actions/runs/8468254805/job/23200822772</a><br>
The workflow is creating a preview of the HTML page in the origin gh-pages branch.<br>
For that, it needs the right permissions. A PR with a branch from origin works, but it fails for forks.</p>
<p>Maybe there are other options to make it work.</p>
openQA Infrastructure - action #158125 (New): typing issue on ppc64 worker - only pick up (or sta...https://progress.opensuse.org/issues/1581252024-03-27T08:52:37Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker size:S (Feedback)" href="https://progress.opensuse.org/issues/158104">#158104</a> we observed typing issues due to mania being overloaded. mania was configured to run 30 openQA worker instances and that was mostly fine as proven in <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Repurpose PowerPC hardware in FC Basement - mania Power8 PowerPC size:M (Resolved)" href="https://progress.opensuse.org/issues/139271#note-24">#139271-24</a>. The recent overload was likely triggered by enabling video again as part of <a class="issue tracker-4 status-1 priority-4 priority-default" title="action: remove NOVIDEO=1 from ppc64le workers (New)" href="https://progress.opensuse.org/issues/157636">#157636</a>. I already reduced the number of worker instances. But this has the drawback that again the long test backlog takes longer to be finished. We should be more flexible in using available ressource. Here I suggest to implement a check in the worker to only pick up new jobs if CPU load is below a configured threshold.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> An openQA worker does not start an openQA job if the CPU load is higher than configured threshold</li>
<li><strong>AC2:</strong> By default worker still pick up jobs if load is not too high</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Possibly the worker code somewhere in <a href="https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker.pm#L472" class="external">https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker.pm#L472</a> can be extended to check the cpu load and if it exceeds a (configurable) threshold then skip picking up any next job</li>
<li>Add a sensible disabled default value in <a href="https://github.com/os-autoinst/openQA/blob/master/etc/openqa/workers.ini" class="external">https://github.com/os-autoinst/openQA/blob/master/etc/openqa/workers.ini</a> with an explanation comment</li>
</ul>
openQA Infrastructure - action #158116 (New): typing issue on ppc64 worker - crosscheck performan...https://progress.opensuse.org/issues/1581162024-03-27T08:14:10Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker size:S (Feedback)" href="https://progress.opensuse.org/issues/158104">#158104</a> system overload on ppc64le machines was found which was likely triggered by <a class="issue tracker-4 status-1 priority-4 priority-default" title="action: remove NOVIDEO=1 from ppc64le workers (New)" href="https://progress.opensuse.org/issues/157636">#157636</a>. As a snapshot the current process list output from htop looks like this:</p>
<pre><code> PID USER PRI NI VIRT RES SHR S DISK R/W CPU% MEM% TIME+ ▽Command
1541 root 20 0 320M 194M 182M S 0.00 B/s 0.0 0.0 2h29:59 /usr/lib/systemd/systemd-j
96369 root 20 0 623M 98880 14336 S 0.00 B/s 0.0 0.0 54:05.86 /usr/bin/python3 /usr/bin/
1 root 20 0 178M 25024 11776 S 0.00 B/s 0.0 0.0 48:46.08 /usr/lib/systemd/systemd n
2000 root 20 0 9728 6208 2176 S 0.00 B/s 0.0 0.0 40:44.69 /usr/sbin/haveged -w 1024
157105 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 68.4 0.0 32:22.39 ffmpeg -y -hide_banner -no
157062 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 42.1 0.0 32:07.83 ffmpeg -y -hide_banner -no
157107 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 68.4 0.0 30:29.03 ffmpeg -y -hide_banner -no
157063 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 5.3 0.0 29:30.58 ffmpeg -y -hide_banner -no
6267 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 63.2 0.0 25:54.22 ffmpeg -y -hide_banner -no
157108 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 63.2 0.0 25:03.79 ffmpeg -y -hide_banner -no
157064 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 2.6 0.0 23:50.53 ffmpeg -y -hide_banner -no
156485 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 34.2 0.0 22:18.78 ffmpeg -y -hide_banner -no
6268 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 57.9 0.0 21:48.92 ffmpeg -y -hide_banner -no
156601 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 10.5 0.0 20:19.58 ffmpeg -y -hide_banner -no
6269 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 55.3 0.0 16:33.02 ffmpeg -y -hide_banner -no
5898 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 86.8 0.0 14:48.15 ffmpeg -y -hide_banner -no
31080 _openqa-wo 20 0 5720M 758M 28416 R 0.00 B/s 57.9 0.1 12:58.63 /usr/bin/qemu-system-ppc64
15778 _openqa-wo 20 0 6767M 1779M 28480 R 0.00 B/s 81.6 0.2 12:50.94 /usr/bin/qemu-system-ppc64
15781 _openqa-wo 20 0 6767M 1779M 28480 S 0.00 B/s 0.0 0.2 10:13.25 /usr/bin/qemu-system-ppc64
156709 _openqa-wo 20 0 6762M 1766M 28288 S 0.00 B/s 13.2 0.2 10:08.67 /usr/bin/qemu-system-ppc64
33559 _openqa-wo 20 0 6756M 1724M 28416 R 0.00 B/s 86.8 0.2 10:05.56 /usr/bin/qemu-system-ppc64
35017 _openqa-wo 20 0 3946M 753M 28416 R 0.00 B/s 84.2 0.1 9:30.77 /usr/bin/qemu-system-ppc64
24085 _openqa-wo 20 0 6901M 1781M 28480 S 0.00 B/s 0.0 0.2 9:13.94 /usr/bin/qemu-system-ppc64
24092 _openqa-wo 20 0 6901M 1781M 28480 R 0.00 B/s 78.9 0.2 8:40.60 /usr/bin/qemu-system-ppc64
28718 _openqa-wo 20 0 7135M 1787M 28480 S 0.00 B/s 50.0 0.2 8:17.91 /usr/bin/qemu-system-ppc64
28720 _openqa-wo 20 0 7135M 1787M 28480 R 0.00 B/s 13.2 0.2 6:51.75 /usr/bin/qemu-system-ppc64
39280 _openqa-wo 20 0 5712M 755M 28416 R 0.00 B/s 65.8 0.1 6:41.38 /usr/bin/qemu-system-ppc64
39683 _openqa-wo 20 0 6731M 1549M 28416 R 0.00 B/s 65.8 0.2 6:24.06 /usr/bin/qemu-system-ppc64
3699 root 20 0 3968 3200 2368 S 0.00 B/s 0.0 0.0 6:04.21 /sbin/agetty -o -p -- \u -
34903 _openqa-wo 20 0 6334M 1483M 28416 R 0.00 B/s 50.0 0.2 5:29.90 /usr/bin/qemu-system-ppc64
34902 _openqa-wo 20 0 6334M 1483M 28416 S 0.00 B/s 0.0 0.2 4:40.00 /usr/bin/qemu-system-ppc64
38988 _openqa-wo 20 0 6790M 1376M 28480 R 0.00 B/s 107.9 0.2 3:52.33 /usr/bin/qemu-system-ppc64
38599 _openqa-wo 20 0 8040M 4187M 28480 R 0.00 B/s 47.4 0.5 3:41.13 /usr/bin/qemu-system-ppc64
45395 _openqa-wo 20 0 3732M 757M 28416 R 0.00 B/s 71.1 0.1 3:38.90 /usr/bin/qemu-system-ppc64
38600 _openqa-wo 20 0 8040M 4187M 28480 S 0.00 B/s 0.0 0.5 3:18.94 /usr/bin/qemu-system-ppc64
43853 _openqa-wo 20 0 5641M 1696M 28480 R 0.00 B/s 63.2 0.2 3:12.66 /usr/bin/qemu-system-ppc64
38456 _openqa-wo 20 0 9087M 4195M 28480 R 0.00 B/s 78.9 0.5 3:08.68 /usr/bin/qemu-system-ppc64
38986 _openqa-wo 20 0 6790M 1376M 28480 R 0.00 B/s 86.8 0.2 3:06.34 /usr/bin/qemu-system-ppc64
</code></pre>
<p>so ffmpeg shows significantly higher accumulated CPU time usage compared to the according qemu processes. We should investigate if ffmpeg is having a "too high" impact on machine performance, if it should be running with nice level to prevent typing issues, if ffmpeg parameters can be tweaked or if ffmpeg should be avoided at all on ppc64le.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openQA test video compression is ensured to not significantly impacting system performance causing typing issues</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Check if ffmpeg CPU usage as visible in the above htop output is considered expected or something unusual</li>
<li>Consider introducing a nice-level for calling ffmpeg in os-autoinst</li>
<li>Crosscheck if ffmpeg can be tweaked, in particular for ppc64le qemu workers</li>
<li>Decide if ffmpeg or even complete should be completely forbidden on ppc64le, see <a class="issue tracker-4 status-1 priority-4 priority-default" title="action: remove NOVIDEO=1 from ppc64le workers (New)" href="https://progress.opensuse.org/issues/157636">#157636</a> </li>
</ul>
QA - action #157741 (Workable): Approve/reject SLE maintenance release requests on IBS synchronou...https://progress.opensuse.org/issues/1577412024-03-22T10:23:10Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>One of the most important responsibilities within SLE maintenance testing is to approve/reject SLE maintenance release requests based on openQA test results. So far <a href="https://github.com/openSUSE/qem-bot" class="external">qem-bot</a> is sufficient to schedule openQA tests but merely does a mediocre job of reporting back results as test results are asynchronously polled based on a periodic schedule <a href="https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules" class="external">https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules</a> causing unnecessary delays, inefficient polling, using outdated results <a class="issue tracker-4 status-4 priority-4 priority-default child" title="action: Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver (Feedback)" href="https://progress.opensuse.org/issues/122311">#122311</a> and not even reporting back on blocking test failures <a class="issue tracker-6 status-1 priority-4 priority-default child parent" title="coordination: [epic] enable qem-bot comments on IBS (was: enable qa-maintenance/openQABot comments on smelt again) (New)" href="https://progress.opensuse.org/issues/97121">#97121</a>. Let's use a proper architecture with efficient event based triggers providing relevant information back to release requests on IBS using core openQA features rather than too much custom lacking downstream tooling: After the PoC in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchr... (Resolved)" href="https://progress.opensuse.org/issues/154498#note-14">#154498-14</a> we should fully implement that to approve/reject the according release request synchronously after AMQP event listening.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> something synchronously approves based on AMQP events</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow-on with the PoC of <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [spike][timeboxed:20h][integration] Approve/reject SLE maintenance release requests on IBS synchr... (Resolved)" href="https://progress.opensuse.org/issues/154498#note-14">#154498-14</a></li>
<li>Setup qem-bot or an alternative on existing or new server but make access to the logs</li>
<li>Add it as part of qem-dashbaord which already has AMQP support</li>
<li>Ensure that qem-bot runs near-continuous to be able to listen to all AMQP events accordingly, maybe back-to-back gitlab CI jobs with limits to prevent parallel execution which we already have?</li>
</ul>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Also related to <a class="issue tracker-4 status-4 priority-4 priority-default child" title="action: Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver (Feedback)" href="https://progress.opensuse.org/issues/122311">#122311</a>, <a class="issue tracker-6 status-1 priority-3 priority-lowest" title="coordination: [saga][epic] Re-combined Maintenance QA tooling covering both SLE+openSUSE (New)" href="https://progress.opensuse.org/issues/123088">#123088</a>, <a class="issue tracker-6 status-1 priority-4 priority-default child parent" title="coordination: [epic] enable qem-bot comments on IBS (was: enable qa-maintenance/openQABot comments on smelt again) (New)" href="https://progress.opensuse.org/issues/97121">#97121</a>, <a class="issue tracker-6 status-1 priority-4 priority-default overdue parent behind-schedule" title="coordination: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, ... (New)" href="https://progress.opensuse.org/issues/99303">#99303</a>, <a class="issue tracker-4 status-3 priority-3 priority-lowest closed child" title="action: Find "last build" of a product over API size:M (Resolved)" href="https://progress.opensuse.org/issues/152939">#152939</a>, <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: [timeboxed:6h][spike solution] a single command line or openQA webUI search view to show all test... (Resolved)" href="https://progress.opensuse.org/issues/131279">#131279</a>, <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does ... (Resolved)" href="https://progress.opensuse.org/issues/117655">#117655</a></p>
openQA Project - action #157273 (Workable): Run os-autoinst-distri-openQA directly from git witho...https://progress.opensuse.org/issues/1572732024-03-14T16:38:04Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>With <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: [spike][timeboxed:10h] Run os-autoinst-distri-example directly from git and ensure candidate need... (Resolved)" href="https://progress.opensuse.org/issues/154783">#154783</a> we have proper git caching so we can run git based tests efficiently on our workers now. Now we should go the next step and migrate one "production" test distribution to use only git and not hold anything provided by admins on o3 in o3:/var/lib/openqa/share/tests for this test distribution.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> /var/lib/openqa/share/tests/open{qa,QA} do not exist</li>
<li><strong>AC2:</strong> openqa-in-openqa tests still pass consistently</li>
<li><strong>AC3:</strong> openqa-in-openqa test details, needle candidates and source code views still show content as expected</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Change test definitions in <a href="https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/master/scenario-definitions.yaml" class="external">https://github.com/os-autoinst/os-autoinst-distri-openQA/blob/master/scenario-definitions.yaml</a> in your branch to use <a href="https://github.com/os-autoinst/os-autoinst-distri-openQA" class="external">https://github.com/os-autoinst/os-autoinst-distri-openQA</a> for test code (and needles)</li>
<li>Check that tests can be triggered this way on a test instance</li>
<li>Do not put anything in /var/lib/openqa/share/tests and ensure tests still work as well as source code view and needle candidates in test details pages</li>
<li>To provide needle candidates there are multiple possibilities when and where the needle candidate data can be provided, try out one or multiple of the following:
<ol>
<li><em>Given</em> a test distribution/needledir does not yet exist in a local cache (like asset downloads work or GIT_CACHE_DIR in os-autoinst and/or worker implementation), <em>When</em> tests are triggered on the side of web UI, <em>Then</em> the relevant data is git cloned, e.g. in the same steps as or similar to *_URL asset download</li>
<li><em>Given</em> a test distribution/needledir does not yet exist in a local cache, <em>When</em> the worker uploads the general test structure, e.g. which modules will be executed, <em>Then</em> the relevant data is git cloned</li>
<li><em>Given</em> a test distribution/needledir does not yet exist in a local cache, <em>When</em> the worker uploads individual needle check results, <em>Then</em> it also uploads as part of the JSON result files and image uploads all the necessary information to display needle candidates <em>And</em> the webUI in the receiving upload handler handles that somewhat … but does not overload when 1k workers upload in parallel or something :)</li>
<li><em>Given</em> a test distribution/needledir does not yet exist in a local cache, <em>When</em> the worker uploads final results (or "finalizes" the job), <em>Then</em> the webUI triggers a download of test files and/or needle files to a local git cache dir as necessary</li>
<li><em>Given</em> a test distribution/needledir does not yet exist in a local cache, <em>When</em> the first person reviews test results and selects needle candidates, <em>Then</em> the webUI triggers a download of test files and/or needle files to a local git cache dir as necessary</li>
</ol></li>
<li>If you identify any bigger feature implementation in openQA or os-autoinst itself being necessary then ensure those requirements are covered in other tickets and block on those tickets accordingly</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>Any bigger feature implementation in openQA or os-autoinst itself.</li>
</ul>
QA - action #157204 (Workable): Sync openQA job removal events to qem-dashboard listening to AMQP...https://progress.opensuse.org/issues/1572042024-03-14T05:33:51Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a href="https://suse.slack.com/archives/C02CLB8TZP1/p1709892527534149?thread_ts=1709883106.021479&cid=C02CLB8TZP1" class="external">https://suse.slack.com/archives/C02CLB8TZP1/p1709892527534149?thread_ts=1709883106.021479&cid=C02CLB8TZP1</a><br>
When openQA jobs are deleted then the according reference in qem-dashboard should also be removed. Listen to AMQP events to sync the removal accordingly</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> SLE maintenance openQA jobs previously blocking SLE maintenance updates on <a href="http://dashboard.qam.suse.de/blocked" class="external">http://dashboard.qam.suse.de/blocked</a> do not block approval after such openQA jobs are deleted from the openQA database</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Use TDD: Extend <a href="https://github.com/openSUSE/qem-dashboard/blob/main/t/amqp.t" class="external">https://github.com/openSUSE/qem-dashboard/blob/main/t/amqp.t</a> and ensure there is a failing test first</li>
<li>Extend <a href="https://github.com/openSUSE/qem-dashboard/blob/08cea810f936faeb6af35b645270d85f6569c6b9/lib/Dashboard/Model/AMQP.pm#L33" class="external">https://github.com/openSUSE/qem-dashboard/blob/08cea810f936faeb6af35b645270d85f6569c6b9/lib/Dashboard/Model/AMQP.pm#L33</a> to update the database entry accordingly or delete, whatever is applicable</li>
<li>For all current openQA job result entries in the dashboard database crosscheck if there are entries for jobs that do not exist anymore in the openQA database. Remove accordingly.</li>
<li>Verify operation in production: E.g. create an artificial, failed openQA job in OSD for a non-critical SLE maintenance update, wait till it shows up as blocking on <a href="http://dashboard.qam.suse.de/blocked" class="external">http://dashboard.qam.suse.de/blocked</a> or in log files of the qem-bot "approve" cycle, remove the job over <code>openqa-cli -X delete jobs/$id</code> again and verify that <a href="http://dashboard.qam.suse.de/blocked" class="external">http://dashboard.qam.suse.de/blocked</a> does not show up as blocked on that job anymore</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>Regular cleanup of results when we missed or have otherwise not received according AMQP events</li>
</ul>
openQA Project - action #155191 (New): Unify GitHub Actions for QA Projects - perlcritic in os-au...https://progress.opensuse.org/issues/1551912024-02-08T10:09:13Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>See <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Unify GitHub Actions for QA Projects size:M (Resolved)" href="https://progress.opensuse.org/issues/138416">#138416</a> and <a class="issue tracker-4 status-1 priority-3 priority-lowest child" title="action: Unify GitHub Actions for QA Projects - perltidy&perlcritic in openQA (New)" href="https://progress.opensuse.org/issues/155188">#155188</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> os-autoinst inherit perlcritic checks from os-autoinst-common</li>
<li><strong>AC2:</strong> Checks in os-autoinst still run successfully on the master branch</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a class="issue tracker-4 status-3 priority-3 priority-lowest closed child" title="action: Unify GitHub Actions for QA Projects - perltidy in os-autoinst size:M (Resolved)" href="https://progress.opensuse.org/issues/155062">#155062</a> but for perlcritic</li>
</ul>
openQA Project - action #155188 (New): Unify GitHub Actions for QA Projects - perltidy&perlcritic...https://progress.opensuse.org/issues/1551882024-02-08T10:07:57Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>See <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Unify GitHub Actions for QA Projects size:M (Resolved)" href="https://progress.opensuse.org/issues/138416">#138416</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openQA inherit perltidy&perlcritic checks from os-autoinst-common</li>
<li><strong>AC2:</strong> Checks in openQA still run successfully on the master branch</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a class="issue tracker-4 status-3 priority-3 priority-lowest closed child" title="action: Unify GitHub Actions for QA Projects - perltidy in os-autoinst size:M (Resolved)" href="https://progress.opensuse.org/issues/155062">#155062</a> but for openQA</li>
</ul>
openQA Infrastructure - action #152101 (Workable): Allow salt to properly configure non-productio...https://progress.opensuse.org/issues/1521012023-12-05T13:41:36Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>See lessons learned meeting <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Conduct "lessons learned" with Five Why analysis for "test fails in iscsi_client due to salt 'hos... (Resolved)" href="https://progress.opensuse.org/issues/139136">#139136</a>. Would diesel now work with the MTU related changes? We should ensure that diesel is treated as tap worker regardless of not being used as production tap-worker.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> non-production workers within the OSD salt config are properly configured as multi-machine openQA workers with suffixes e.g. <code>WORKER_CLASS=tap_poo1234</code> (no GRE tunnels required)</li>
<li><strong>AC2:</strong> non-production openQA workers with a special worker class can execute multi-machine jobs correctly if triggered against the special worker class</li>
<li><strong>AC3:</strong> non-production openQA workers with a special worker class do not pick up production multi-machine jobs</li>
<li><strong>AC4:</strong> The team knows how to configure non-production multi-machine workers for development/setup/debugging</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Read how <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/openvswitch.sls?ref_type=heads#L24" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/openvswitch.sls?ref_type=heads#L24</a> matches on "tap" in WORKER_CLASS. We used to disable production tap worker classes by adding a suffix that include a ticket reference, e.g. "tap_poo1234"</li>
<li><del>Extend that to be able to match on variants of "tap"</del> It will still work with e.g. "tap_poo1234". The last part of <code>multihostclass in pillar['workerconf'][host]['workers'][wnum]['WORKER_CLASS']</code> refers to a string (and not a list) so Python does in fact just check whether the string contains <code>tap</code> at all.</li>
<li>Take a look at OSD workers that currently have "tap_$something", see <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a>, e.g. qesapworker-prg4 or diesel, and verify that those workers can still execute multi-machine clusters if scheduled against those specific classes</li>
<li>Document that this is how one can ensure a worker is configured for multi-machine tests but not for production jobs</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>We don't strictly care about GRE tunnels here</li>
</ul>
QA - action #139115 (Workable): Ensure o3 openQA PowerPC machine qa-power8-3 is operational from ...https://progress.opensuse.org/issues/1391152023-11-04T12:51:06Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Most PowerPC machines are being setup in PRG2 within <a class="issue tracker-4 status-15 priority-3 priority-lowest child" title="action: Support move of PowerPC machines to PRG2 size:M (Blocked)" href="https://progress.opensuse.org/issues/132140">#132140</a> and most machines could be discovered from the HMC. qa-power8-3 is meant for o3 and likely needs more collaboration with SUSE-IT Eng-Infra to bring the machine back into operation for o3 as the machine is a bare-metal installation we would rely on ASM+IPMI (HMC <strong>not</strong> needed) and system ethernet in the o3 network.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> qa-power8-3 openQA instances are able to pass o3 openQA jobs after the move to PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Read <a class="issue tracker-4 status-15 priority-3 priority-lowest child" title="action: Support move of PowerPC machines to PRG2 size:M (Blocked)" href="https://progress.opensuse.org/issues/132140">#132140</a> about the generic setup and in particular the HMC and understand that we work with the machine bare-metal in so called "OPAL" mode here, similar to kerosene</li>
<li>See current configuration and inventory management entry <a href="https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=2352" class="external">https://racktables.nue.suse.com/index.php?page=object&tab=default&object_id=2352</a> for the machine</li>
<li>Check if one of the specified interfaces show up in o3 dhcp logs (dnsmasq)</li>
<li>Crosscheck mac address entries on racktables against the entries in dnsmasq DHCP static lease configuration</li>
<li>Ensure we have access to qa-power8-3 manually again over ASM and IPMI as well as with verification openQA jobs on o3</li>
<li>Inform users about the result</li>
<li>Update racktables entry accordingly well as <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a></li>
</ul>
openQA Project - action #130940 (New): Trigger openQA tests mentioned in github comments as part ...https://progress.opensuse.org/issues/1309402023-06-15T10:17:05Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Follow-up to <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Trigger openQA tests mentioned in github description as part of CI size:M (Resolved)" href="https://progress.opensuse.org/issues/130934">#130934</a>. After having openQA CI integration which reads github pull request description and triggers according openQA clone jobs github comments created or updated should be considered the same.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openQA CI integration in any existing test distribution github project automatically runs openQA tests based on links to existing openQA jobs in the github comments</li>
<li><strong>AC2:</strong> Ensure openQA documentation covers that also github comments are parsed and how to use that</li>
<li><strong>AC3:</strong> An update to the PR code does not retrigger any openQA jobs mentioned in comments</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Wait for <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Trigger openQA tests mentioned in github description as part of CI size:M (Resolved)" href="https://progress.opensuse.org/issues/130934">#130934</a></li>
<li>Read what was done originally in <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Have git-clone-custom-refspec pickup tests from PR descriptions (Resolved)" href="https://progress.opensuse.org/issues/63712">#63712</a> and the according pull request to openQA <a href="https://github.com/os-autoinst/openQA/pull/2618" class="external">https://github.com/os-autoinst/openQA/pull/2618</a> and also <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Trigger openQA tests mentioned in github description as part of CI size:M (Resolved)" href="https://progress.opensuse.org/issues/130934">#130934</a></li>
<li>Check if openqa-clone-custom-git-refspec with special comments also covers github comments or only the initial description</li>
<li>Extend the existing approach where necessary</li>
<li>Ensure that the same process is triggered also on comment <em>updates</em></li>
</ul>
QA - action #119176 (Workable): Automated alerts and reminders about SLO's for openqatests size:Mhttps://progress.opensuse.org/issues/1191762022-10-21T10:12:45Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Some tickets don't see regular updates, even with automated reminders in some cases. See <a href="https://progress.opensuse.org/projects/openqatests/wiki#SLOs-service-level-objectives" class="external">https://progress.opensuse.org/projects/openqatests/wiki#SLOs-service-level-objectives</a> for the documented workflows/goals. We assume processes are followed better if there is automation that reminds people about missed targets and makes it easier to understand what updates are missing.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> <strong>DONE</strong> SLOs are reflected in automated reminders</li>
<li><strong>AC2</strong>: <strong>DONE</strong> The first reminder is implemented based on the documented workflow</li>
<li><strong>AC3</strong>: The second reminder is known to update the priority automatically</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE:</em> Wait for <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Automated alerts and reminders about SLO's for openqatests (only one reminder) size:M (Resolved)" href="https://progress.opensuse.org/issues/116545">#116545</a></li>
<li><em>DONE:</em> Same as in queries for QA tools we likely should only look at the update time of tickets with no subtasks, e.g. see the definition of <a href="https://progress.opensuse.org/issues?query_id=542" class="external">https://progress.opensuse.org/issues?query_id=542</a>, to prevent cases like <a href="https://progress.opensuse.org/issues/113749#note-13" class="external">https://progress.opensuse.org/issues/113749#note-13</a> ff which is Urgent but hasn't been updated for a long time.</li>
<li><em>DONE:</em> Only write the same comment once -> <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Automated alerts and reminders about SLO's for openqatests (only one reminder) size:M (Resolved)" href="https://progress.opensuse.org/issues/116545">#116545</a></li>
<li>Research what has been done in <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: Automated alerts and reminders about SLO's for openqatests size:M (Resolved)" href="https://progress.opensuse.org/issues/113797">#113797</a></li>
<li>Review and understand <a href="https://github.com/openSUSE/openqa-tests-backlog" class="external">https://github.com/openSUSE/openqa-tests-backlog</a> as well as <a href="https://github.com/openSUSE/backlogger" class="external">https://github.com/openSUSE/backlogger</a> and <a href="https://opensuse.github.io/openqa-tests-backlog/" class="external">https://opensuse.github.io/openqa-tests-backlog/</a></li>
<li>As the queries in <a href="https://github.com/openSUSE/openqa-tests-backlog/blob/main/queries.yaml#L5" class="external">https://github.com/openSUSE/openqa-tests-backlog/blob/main/queries.yaml#L5</a> are not named queries but defined in place we could just define a "grace period" for each query and only act automatically if not done already by users, e.g. don't remind on urgent tickets after 7 days but only 7+2 days</li>
<li>Follow the SLO about the suggestion of the "second reminder"</li>
</ul>
QA - action #112871 (Workable): obs_rsync_run Minion tasks fail with no error message size:Mhttps://progress.opensuse.org/issues/1128712022-06-22T11:01:40Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h3 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h3>
<p>Periodically <code>obs_rsync_run</code> Minion tasks fail like so:</p>
<pre><code>---
args:
- project: SUSE:SLE-15-SP3:Update:BCI
attempts: 1
children: []
created: 2022-06-20T13:16:56.947614Z
delayed: 2022-06-20T13:35:47.741627Z
expires: ~
finished: 2022-06-20T13:35:48.639216Z
id: 4720491
lax: 0
notes:
gru_id: 31834209
project_lock: 1
parents: []
priority: 100
queue: default
result:
code: 256
message: No message
retried: 2022-06-20T13:34:47.741627Z
retries: 18
started: 2022-06-20T13:35:47.983500Z
state: failed
task: obs_rsync_run
time: 2022-06-22T10:24:43.930467Z
worker: 767
</code></pre>
<p>There is no error message and I can't guess what might have caused this. There seems to be a code path that consumes errors without propagating them.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> The jobs do not fail with unknown errors anymore</li>
</ul>
<a name="Suggestions"></a>
<h3 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h3>
<ul>
<li>Research the plugin and try to understand what the code is meant to achieve</li>
<li>Maybe just make the job not fail, if there is no real error and this is an expected condition (Minion jobs are only supposed to fail for real errors that need human intervention)</li>
<li><a href="https://github.com/os-autoinst/openqa-trigger-from-obs" class="external">https://github.com/os-autoinst/openqa-trigger-from-obs</a></li>
<li>Hypothesising the project in question was not configured, there is no files
<ul>
<li><a href="https://download.suse.de/ibs/SUSE:/ALP:/Source:/Standard:/1.0:/Staging:/V/images/iso/" class="external">https://download.suse.de/ibs/SUSE:/ALP:/Source:/Standard:/1.0:/Staging:/V/images/iso/</a> is empty</li>
<li>add a list of files to "notes" of the minion</li>
<li>Add a IBS/OBS/GitLab URL to the "notes"</li>
<li>Add stderr to "notes"</li>
</ul></li>
</ul>
openQA Infrastructure - action #76951 (Workable): Check if new firmware for kerosene (aka. power8...https://progress.opensuse.org/issues/769512020-11-04T06:06:17Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>New firmware might help to prevent qemu failing to run. If we find new firmware we could remove the parameters in os-autoinst again, see clone source ticket</p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Read about context of the needed workaround <a class="issue tracker-4 status-3 priority-6 priority-high2 closed" title="action: 100% of powerpc tests incomplete auto_review:"(?s)Running on power8.*qemu-system-ppc64: Requested... (Resolved)" href="https://progress.opensuse.org/issues/75259">#75259</a></li>
<li>Currently <a href="https://kerosene-sp.qe.nue2.suse.org" class="external">https://kerosene-sp.qe.nue2.suse.org</a> lists FW840.00. Compare to other machines like diesel+petrol to see if there is a newer ASM version?</li>
<li>Look for new firmware for the machine, just search for new firmware on IBM web pages</li>
<li>Check if the new firmware means we do not need <a href="https://github.com/os-autoinst/os-autoinst/pull/1554" class="external">https://github.com/os-autoinst/os-autoinst/pull/1554</a> anymore, if yes, remove again, if no, remove again but add according settings to the machine settings in openQA, this is also what "adamw" did:</li>
</ul>
<pre><code>[04/11/2020 17:41:52] <adamw> okurz: i don't really know what the consequences of it are, but i tend to the idea that qemu wouldn't be trying to make it the default without reason :) i can ask some virt guys if you like
[04/11/2020 17:42:09] <adamw> okurz: but on the whole, yes, it seems to be it'd be more appropriate to put it in your templates rather than hardwire it into os-autoinst.
[04/11/2020 17:42:29] <adamw> that's what i was doing when we had the problem (i was setting an older machine type in our ppc64le Machine vars)
</code></pre>