openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-28T09:11:29ZopenSUSE Project Management Tool
Redmine openQA Tests - action #158215 (New): windows images cleaned up - but referenced in jobshttps://progress.opensuse.org/issues/1582152024-03-28T09:11:29Zdimstardimstar@opensuse.org
<p>Recently, some old windows 10 images had been cleaned up from o3<br>
First batch was fix in the jobgroups to reference the new images - but snapshot 0327 still has a few tests failing on assets:</p>
<p>Flavor: DVD <br>
gnome_dual_windows10@64bit_win <a href="https://openqa.opensuse.org/tests/4047942" class="external">https://openqa.opensuse.org/tests/4047942</a><br>
gnome_dual_windows10@uefi_win <a href="https://openqa.opensuse.org/tests/4047941" class="external">https://openqa.opensuse.org/tests/4047941</a><br>
kde_dual_windows10@64bit_win <a href="https://openqa.opensuse.org/tests/4047943" class="external">https://openqa.opensuse.org/tests/4047943</a><br>
kde_dual_windows10@uefi_win <a href="https://openqa.opensuse.org/tests/4047940" class="external">https://openqa.opensuse.org/tests/4047940</a></p>
<p>Flavor: NET<br>
kde_dual_windows10@uefi_win <a href="https://openqa.opensuse.org/tests/4047944" class="external">https://openqa.opensuse.org/tests/4047944</a></p>
<p>They all miss the relevant HDD_1 asset, e.g.<br>
Reason: asset failure: Failed to download <a href="mailto:windows-10-x86_64-21H1@64bit_win.qcow2">windows-10-x86_64-21H1@64bit_win.qcow2</a> to /var/lib/openqa/cache/openqa.opensuse.org/<a href="mailto:windows-10-x86_64-21H1@64bit_win.qcow2">windows-10-x86_64-21H1@64bit_win.qcow2</a> </p>
<p>The setting comes from the test suite directly</p>
<p>gnome_dual_windows10 CDMODEL=ide-cd<br>
DESKTOP=gnome<br>
DUALBOOT=1<br>
EXCLUDE_MODULES=system_prepare<br>
HDDVERSION=Windows 10<br>
HDD_1=windows-10-x86_64-21H1@%MACHINE%.qcow2<br>
Maintainer: <a href="mailto:grace.wang@suse.com">grace.wang@suse.com</a> </p>
<p>kde_dual_windows10 CDMODEL=ide-cd<br>
DESKTOP=kde<br>
DUALBOOT=1<br>
EXCLUDE_MODULES=system_prepare<br>
HDDVERSION=Windows 10<br>
HDD_1=windows-10-x86_64-1903@%MACHINE%.qcow2<br>
Maintainer: <a href="mailto:grace.wang@suse.com">grace.wang@suse.com</a></p>
qe-yam - action #158209 (New): [Research] Add service check test on migration path from 15SP3 to ...https://progress.opensuse.org/issues/1582092024-03-28T07:56:02Zlelileli@suse.com
<a name="Motivation"></a>
<h4 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h4>
<p>This idea comes from a mail from customer titled 'named.service won't start (permission denied) after upgrade 15SP3-->15SP5', I will paste the content of the mail in comments.<br>
To test named after migration we need add service check, to cover the migration path of 15SP3 to 15SP5, we need add a continuous migration test with service check.</p>
<p>Ex: the current service check for named in regression test <a href="https://openqa.suse.de/tests/13887909#step/check_upgraded_service/16" class="external">online_sles15sp4_pscc_live-basesys-srv-desktop-dev-contm-lgm-tsm-wsm-pcm_all_full</a> for reference.</p>
<a name="Acceptance-criteria"></a>
<h4 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h4>
<p><strong>AC1</strong>: Add service check test on migration path from 15SP3 to 15SP5.</p>
qe-yam - action #158194 (New): If firewall is disabled, it should not match the tag 'nfs-firewall...https://progress.opensuse.org/issues/1581942024-03-28T05:54:09Ztinawang123yuwang@suse.com
<p><strong>Motivation</strong><br>
Failed job: <a href="https://openqa.suse.de/tests/13830874#step/install_service/123" class="external">https://openqa.suse.de/tests/13830874#step/install_service/123</a><br>
As firewall is disabled, so the 'open port in firewall' cannot be chosen. </p>
<p><strong>Acceptance criteria</strong><br>
AC1: Update the code to check if need send key 'alt-f' to open port in firewall.</p>
qe-yam - action #158191 (New): ppc64le_regression_test_offline_textmode.yaml should not include d...https://progress.opensuse.org/issues/1581912024-03-28T03:18:09Ztinawang123yuwang@suse.com
<p><strong>Motivation</strong><br>
Failed job: <a href="https://openqa.suse.de/tests/13892873" class="external">https://openqa.suse.de/tests/13892873</a><br>
This job is textmode, but include desktop x11 test modules.</p>
<p><strong>Acceptance criteria</strong><br>
AC1: Should update the yaml profile. textmode job should not test desktop x11. </p>
openQA Infrastructure - action #158185 (Feedback): parallel job failed to get the vars from its p...https://progress.opensuse.org/issues/1581852024-03-28T00:56:46ZJulie_CAOjcao@suse.com
<a name="Observation"></a>
<h3 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h3>
<p>We have a parallel job which failed in getting the vars from its pair. Rerun still failed. Is there something wrong with the worker service?</p>
<pre><code>sub get_var_from_parent {
my ($self, $var) = @_;
my $parents = get_parents();
#Query every parent to find the var
for my $job_id (@$parents) {
my $ref = get_job_autoinst_vars($job_id);
return $ref->{$var} if defined $ref->{$var};
}
return;
}
</code></pre>
<p><a href="https://openqa.suse.de/tests/13885165/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/13885165/logfile?filename=autoinst-log.txt</a></p>
<pre><code>[2024-03-27T15:39:25.691962Z] [debug] [pid:4639] get_job_autoinst_vars: Connection error: Can't connect: Name or service not known; URL was http://worker35:20493/wS5wkxkWNNB9LK92/vars
</code></pre> qe-yam - action #158158 (Workable): GTK glitch in yast2_lan_restart_vlanhttps://progress.opensuse.org/issues/1581582024-03-27T12:10:09Zrainerkoenig
<a name="Observation"></a>
<h4 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h4>
<p>openQA test in scenario sle-15-SP6-Online-ppc64le-yast2_gui@ppc64le-4g fails in<br>
<a href="https://openqa.suse.de/tests/13886245/modules/yast2_lan_restart_vlan/steps/39" class="external">yast2_lan_restart_vlan</a></p>
<p>The problem is the well known <a href="https://progress.opensuse.org/issues/124652" class="external">screen refresh glitch</a> showing up again.<br>
The workaround_poo124652 needs to be applied here.</p>
<a name="Additional-information"></a>
<h4 >Additional information<a href="#Additional-information" class="wiki-anchor">¶</a></h4>
<p>Screenshot from failed run<br>
<img src="https://progress.opensuse.org/attachments/download/17515/screenshot-glitch.png" alt="Screenshot from failure" loading="lazy" /></p>
<p>Screenshot from previous run<br>
<img src="https://progress.opensuse.org/attachments/download/17512/screenshot-ok.png" alt="Screenshot from previous run" loading="lazy" /><br>
<a href="https://openqa.suse.de/tests/13849100#step/yast2_lan_restart_vlan/38" class="external">Link to the passed test step</a></p>
<a name="Acceptance-criteria"></a>
<h4 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h4>
<ul>
<li><strong>AC1</strong>: <code>workaround_poo124652</code> from <code>lib/YaST/workarounds.pm</code> is applied for this situation.</li>
<li><strong>AC2</strong>: problem does not longer show up.</li>
</ul>
openQA Project - action #158146 (New): Prevent scheduling across-host multimachine clusters to ho...https://progress.opensuse.org/issues/1581462024-03-27T11:06:56Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Multi-machine jobs have been failing since 20230814, because of a misconfiguration of the MTU/GRE tunnels. A workaround has been found in forcing the complete multi-machine tests to run in the same worker. In <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: Optionally restrict multimachine jobs to a single worker (Resolved)" href="https://progress.opensuse.org/issues/135035">#135035</a> we added a feature flag to limit jobs to a single physical host which can be used for debugging or as temporary workaround or if the network design prevents multiple hosts to be interconnected by GRE tunnels. But by default when multi-machine jobs are scheduled with worker classes fulfilled by multiple hosts which might not be properly interconnected then there is no measure preventing workers to pick up such clusters causing hard to investigate openQA job failures which we should try to prevent. Can we propagate test variables like the "limit to one host only" feature flag in worker properties so that the openQA scheduler can see that flag before assigning to workers?</p>
<a name="Acceptance-Criteria"></a>
<h2 >Acceptance Criteria<a href="#Acceptance-Criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> the openQA scheduler does not schedule across-host multimachine clusters to any host that has the feature flag from <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: Optionally restrict multimachine jobs to a single worker (Resolved)" href="https://progress.opensuse.org/issues/135035">#135035</a> set</li>
<li><strong>AC2:</strong> By default jobs of a multi-machine parallel cluster can still be scheduled covering multiple different hosts</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Look into what was done in <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: Optionally restrict multimachine jobs to a single worker (Resolved)" href="https://progress.opensuse.org/issues/135035">#135035</a> but for the central openQA scheduler</li>
<li>Investigate if any worker properties are already available to read by the openQA scheduler when scheduling. At least it knows about the worker class already, right? Should we translate the feature flag from <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: Optionally restrict multimachine jobs to a single worker (Resolved)" href="https://progress.opensuse.org/issues/135035">#135035</a> as a "special worker class" to act as an exclusive class that is only implemented by one host at a time?</li>
<li>Ensure that the scheduler does not schedule across-host multimachine clusters to any host that has such special worker class or worker property</li>
</ul>
openQA Project - action #158143 (New): Make workers unassign/reject/incomplete jobs when across-h...https://progress.opensuse.org/issues/1581432024-03-27T11:01:42Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Multi-machine jobs have been failing since 20230814, because of a misconfiguration of the MTU/GRE tunnels. A workaround has been found in forcing the complete multi-machine tests to run in the same worker. In <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: Optionally restrict multimachine jobs to a single worker (Resolved)" href="https://progress.opensuse.org/issues/135035">#135035</a> we added a feature flag to limit jobs to a single physical host which can be used for debugging or as temporary workaround or if the network design prevents multiple hosts to be interconnected by GRE tunnels. But by default when multi-machine jobs are scheduled with worker classes fulfilled by multiple hosts which might not be properly interconnected then there is no measure preventing workers to pick up such clusters causing hard to investigate openQA job failures which we should try to prevent. We should make workers unassign/reject/incomplete jobs when across-host multimachine setup is requested but not available and optionally inform about the possibility to use the "limit to one host only" feature flag.</p>
<a name="Acceptance-Criteria"></a>
<h2 >Acceptance Criteria<a href="#Acceptance-Criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openQA workers with "tap" class but not configured for across-host multimachine setup do not fail openQA jobs due to being spread over multiple hosts</li>
<li><strong>AC2:</strong> By default jobs of a multi-machine parallel cluster can still be scheduled covering multiple different hosts</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Look into what was done in <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: Optionally restrict multimachine jobs to a single worker (Resolved)" href="https://progress.opensuse.org/issues/135035">#135035</a> but for the central openQA scheduler</li>
<li>Investigate if a worker knows about other workers that it would need to communicate with in a multi-machine cluster job, possibly during the "assignment" step</li>
<li>Implement a pre-run check, possibly during the "assignment" step, where the worker would check if pre-requisites for across-host multimachine testing are fulfilled <em>if</em> the test cluster would need that, and fail early</li>
<li>Ensure that such early failure is fed back to the openQA scheduler, e.g. by unassigning the job, possibly with an explicit message visible by admins somewhere?</li>
<li>If not possible to unassign then somehow "reject" jobs or as last resort "incomplete" a job with an explicit "reason" which is still better than actually starting an openQA job and then causing fails</li>
<li>Optionally in the message/reason returned suggest to the admin/users to use the feature flag from <a class="issue tracker-4 status-3 priority-4 priority-default closed child behind-schedule" title="action: Optionally restrict multimachine jobs to a single worker (Resolved)" href="https://progress.opensuse.org/issues/135035">#135035</a></li>
</ul>
openQA Tests - action #158128 (Feedback): [qe-core][leap15.5]test fails in libgit2, seems some is...https://progress.opensuse.org/issues/1581282024-03-27T09:05:33Zrfan1richard.fan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario opensuse-15.5-DVD-Updates-x86_64-extra_tests_textmode@64bit fails in<br>
<a href="https://openqa.opensuse.org/tests/4043927/modules/libgit2/steps/4" class="external">libgit2</a></p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>Maintainer: <a href="mailto:slindomansilla@suse.de">slindomansilla@suse.de</a>.<br>
Mainly post-installation console extra tests.</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.opensuse.org/tests/4038361" class="external">20240325-4</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.opensuse.org/tests/4037445" class="external">20240325-3</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD-Updates&machine=64bit&test=extra_tests_textmode&version=15.5" class="external">latest</a></p>
openQA Infrastructure - action #158125 (New): typing issue on ppc64 worker - only pick up (or sta...https://progress.opensuse.org/issues/1581252024-03-27T08:52:37Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker size:S (Feedback)" href="https://progress.opensuse.org/issues/158104">#158104</a> we observed typing issues due to mania being overloaded. mania was configured to run 30 openQA worker instances and that was mostly fine as proven in <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Repurpose PowerPC hardware in FC Basement - mania Power8 PowerPC size:M (Resolved)" href="https://progress.opensuse.org/issues/139271#note-24">#139271-24</a>. The recent overload was likely triggered by enabling video again as part of <a class="issue tracker-4 status-1 priority-4 priority-default" title="action: remove NOVIDEO=1 from ppc64le workers (New)" href="https://progress.opensuse.org/issues/157636">#157636</a>. I already reduced the number of worker instances. But this has the drawback that again the long test backlog takes longer to be finished. We should be more flexible in using available ressource. Here I suggest to implement a check in the worker to only pick up new jobs if CPU load is below a configured threshold.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> An openQA worker does not start an openQA job if the CPU load is higher than configured threshold</li>
<li><strong>AC2:</strong> By default worker still pick up jobs if load is not too high</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Possibly the worker code somewhere in <a href="https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker.pm#L472" class="external">https://github.com/os-autoinst/openQA/blob/master/lib/OpenQA/Worker.pm#L472</a> can be extended to check the cpu load and if it exceeds a (configurable) threshold then skip picking up any next job</li>
<li>Add a sensible disabled default value in <a href="https://github.com/os-autoinst/openQA/blob/master/etc/openqa/workers.ini" class="external">https://github.com/os-autoinst/openQA/blob/master/etc/openqa/workers.ini</a> with an explanation comment</li>
</ul>
openQA Infrastructure - action #158116 (New): typing issue on ppc64 worker - crosscheck performan...https://progress.opensuse.org/issues/1581162024-03-27T08:14:10Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker size:S (Feedback)" href="https://progress.opensuse.org/issues/158104">#158104</a> system overload on ppc64le machines was found which was likely triggered by <a class="issue tracker-4 status-1 priority-4 priority-default" title="action: remove NOVIDEO=1 from ppc64le workers (New)" href="https://progress.opensuse.org/issues/157636">#157636</a>. As a snapshot the current process list output from htop looks like this:</p>
<pre><code> PID USER PRI NI VIRT RES SHR S DISK R/W CPU% MEM% TIME+ â–½Command
1541 root 20 0 320M 194M 182M S 0.00 B/s 0.0 0.0 2h29:59 /usr/lib/systemd/systemd-j
96369 root 20 0 623M 98880 14336 S 0.00 B/s 0.0 0.0 54:05.86 /usr/bin/python3 /usr/bin/
1 root 20 0 178M 25024 11776 S 0.00 B/s 0.0 0.0 48:46.08 /usr/lib/systemd/systemd n
2000 root 20 0 9728 6208 2176 S 0.00 B/s 0.0 0.0 40:44.69 /usr/sbin/haveged -w 1024
157105 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 68.4 0.0 32:22.39 ffmpeg -y -hide_banner -no
157062 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 42.1 0.0 32:07.83 ffmpeg -y -hide_banner -no
157107 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 68.4 0.0 30:29.03 ffmpeg -y -hide_banner -no
157063 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 5.3 0.0 29:30.58 ffmpeg -y -hide_banner -no
6267 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 63.2 0.0 25:54.22 ffmpeg -y -hide_banner -no
157108 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 63.2 0.0 25:03.79 ffmpeg -y -hide_banner -no
157064 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 2.6 0.0 23:50.53 ffmpeg -y -hide_banner -no
156485 _openqa-wo 20 0 427M 189M 23808 R 0.00 B/s 34.2 0.0 22:18.78 ffmpeg -y -hide_banner -no
6268 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 57.9 0.0 21:48.92 ffmpeg -y -hide_banner -no
156601 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 10.5 0.0 20:19.58 ffmpeg -y -hide_banner -no
6269 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 55.3 0.0 16:33.02 ffmpeg -y -hide_banner -no
5898 _openqa-wo 20 0 427M 193M 23808 R 0.00 B/s 86.8 0.0 14:48.15 ffmpeg -y -hide_banner -no
31080 _openqa-wo 20 0 5720M 758M 28416 R 0.00 B/s 57.9 0.1 12:58.63 /usr/bin/qemu-system-ppc64
15778 _openqa-wo 20 0 6767M 1779M 28480 R 0.00 B/s 81.6 0.2 12:50.94 /usr/bin/qemu-system-ppc64
15781 _openqa-wo 20 0 6767M 1779M 28480 S 0.00 B/s 0.0 0.2 10:13.25 /usr/bin/qemu-system-ppc64
156709 _openqa-wo 20 0 6762M 1766M 28288 S 0.00 B/s 13.2 0.2 10:08.67 /usr/bin/qemu-system-ppc64
33559 _openqa-wo 20 0 6756M 1724M 28416 R 0.00 B/s 86.8 0.2 10:05.56 /usr/bin/qemu-system-ppc64
35017 _openqa-wo 20 0 3946M 753M 28416 R 0.00 B/s 84.2 0.1 9:30.77 /usr/bin/qemu-system-ppc64
24085 _openqa-wo 20 0 6901M 1781M 28480 S 0.00 B/s 0.0 0.2 9:13.94 /usr/bin/qemu-system-ppc64
24092 _openqa-wo 20 0 6901M 1781M 28480 R 0.00 B/s 78.9 0.2 8:40.60 /usr/bin/qemu-system-ppc64
28718 _openqa-wo 20 0 7135M 1787M 28480 S 0.00 B/s 50.0 0.2 8:17.91 /usr/bin/qemu-system-ppc64
28720 _openqa-wo 20 0 7135M 1787M 28480 R 0.00 B/s 13.2 0.2 6:51.75 /usr/bin/qemu-system-ppc64
39280 _openqa-wo 20 0 5712M 755M 28416 R 0.00 B/s 65.8 0.1 6:41.38 /usr/bin/qemu-system-ppc64
39683 _openqa-wo 20 0 6731M 1549M 28416 R 0.00 B/s 65.8 0.2 6:24.06 /usr/bin/qemu-system-ppc64
3699 root 20 0 3968 3200 2368 S 0.00 B/s 0.0 0.0 6:04.21 /sbin/agetty -o -p -- \u -
34903 _openqa-wo 20 0 6334M 1483M 28416 R 0.00 B/s 50.0 0.2 5:29.90 /usr/bin/qemu-system-ppc64
34902 _openqa-wo 20 0 6334M 1483M 28416 S 0.00 B/s 0.0 0.2 4:40.00 /usr/bin/qemu-system-ppc64
38988 _openqa-wo 20 0 6790M 1376M 28480 R 0.00 B/s 107.9 0.2 3:52.33 /usr/bin/qemu-system-ppc64
38599 _openqa-wo 20 0 8040M 4187M 28480 R 0.00 B/s 47.4 0.5 3:41.13 /usr/bin/qemu-system-ppc64
45395 _openqa-wo 20 0 3732M 757M 28416 R 0.00 B/s 71.1 0.1 3:38.90 /usr/bin/qemu-system-ppc64
38600 _openqa-wo 20 0 8040M 4187M 28480 S 0.00 B/s 0.0 0.5 3:18.94 /usr/bin/qemu-system-ppc64
43853 _openqa-wo 20 0 5641M 1696M 28480 R 0.00 B/s 63.2 0.2 3:12.66 /usr/bin/qemu-system-ppc64
38456 _openqa-wo 20 0 9087M 4195M 28480 R 0.00 B/s 78.9 0.5 3:08.68 /usr/bin/qemu-system-ppc64
38986 _openqa-wo 20 0 6790M 1376M 28480 R 0.00 B/s 86.8 0.2 3:06.34 /usr/bin/qemu-system-ppc64
</code></pre>
<p>so ffmpeg shows significantly higher accumulated CPU time usage compared to the according qemu processes. We should investigate if ffmpeg is having a "too high" impact on machine performance, if it should be running with nice level to prevent typing issues, if ffmpeg parameters can be tweaked or if ffmpeg should be avoided at all on ppc64le.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openQA test video compression is ensured to not significantly impacting system performance causing typing issues</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Check if ffmpeg CPU usage as visible in the above htop output is considered expected or something unusual</li>
<li>Consider introducing a nice-level for calling ffmpeg in os-autoinst</li>
<li>Crosscheck if ffmpeg can be tweaked, in particular for ppc64le qemu workers</li>
<li>Decide if ffmpeg or even complete should be completely forbidden on ppc64le, see <a class="issue tracker-4 status-1 priority-4 priority-default" title="action: remove NOVIDEO=1 from ppc64le workers (New)" href="https://progress.opensuse.org/issues/157636">#157636</a> </li>
</ul>
openQA Tests - action #158035 (New): sshfs: there can be a question about untrusted host keyhttps://progress.opensuse.org/issues/1580352024-03-26T09:26:55Zdimstardimstar@opensuse.org
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-upgrade_Leap_15.2_gnome@64bit fails in<br>
<a href="https://openqa.opensuse.org/tests/4040667/modules/sshfs/steps/5" class="external">sshfs</a></p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>Maintainer: zluo Upgrade test Leap 15.2 to TW</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.opensuse.org/tests/4039994" class="external">20240325</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.opensuse.org/tests/4034342" class="external">20240322</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=upgrade_Leap_15.2_gnome&version=Tumbleweed" class="external">latest</a></p>
openQA Tests - action #158032 (New): children test: CD not 'in drive' - but repo is still present...https://progress.opensuse.org/issues/1580322024-03-26T09:19:20Zdimstardimstar@opensuse.org
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>This looks rather weird to only show up now - but the parent job used the DVD to install the system, which gives a repository based on the DVD in the system.</p>
<p>A child job then boots the originally prior created disk image, and upon zypper refresh fails to find the DVD (cd:// repo)<br>
Either the DVD should be ensured to be in 'the drive' or the repo not be enabled</p>
<p>openQA test in scenario opensuse-Tumbleweed-DVD-x86_64-yast2_ui_devel@64bit fails in<br>
<a href="https://openqa.opensuse.org/tests/4040665/modules/yast2_cmdline/steps/20" class="external">yast2_cmdline</a></p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>Maintainer: zluo</p>
<p>yast2 upstream test suites.</p>
<p><a href="https://progress.opensuse.org/issues/20206" class="external">https://progress.opensuse.org/issues/20206</a></p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.opensuse.org/tests/4039930" class="external">20240325</a></p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.opensuse.org/tests/4034278" class="external">20240322</a> (or more recent)</p>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.opensuse.org/tests/latest?arch=x86_64&distri=opensuse&flavor=DVD&machine=64bit&test=yast2_ui_devel&version=Tumbleweed" class="external">latest</a></p>
qe-yam - action #158014 (New): Remove not used autoyast profileshttps://progress.opensuse.org/issues/1580142024-03-26T07:49:34Zlelileli@suse.com
<a name="Motivation"></a>
<h4 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h4>
<p>During update autoyast profiles, found some profiles may not used any more, we can check and remove them. I tried to search these profiles in yam git repo openqa-job-groups/JobGroups and schedule folders(schedule/yam and schedule/yast), but not found. Anyway, it is better to double check it again when trying to apply this ticket.</p>
<p>data/yam/autoyast/create_hdd_gnome.xml</p>
<a name="Acceptance-criteria"></a>
<h4 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h4>
<p><strong>AC1</strong>: Remove not used autoyast profiles</p>
openQA Project - action #157996 (New): Upgrade all other LSG QE salt controlled machines to openS...https://progress.opensuse.org/issues/1579962024-03-26T07:12:35Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<ul>
<li>Need to upgrade machines before EOL of Leap 15.5 and have a consistent environment</li>
</ul>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> all LSG QE salt controlled machines run a clean upgraded openSUSE Leap 15.6 (no failed systemd services, no left over .rpm-new files, etc.) except for OSD workers</li>
</ul>
<a name="Acceptance-tests"></a>
<h2 >Acceptance tests<a href="#Acceptance-tests" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AT1-1:</strong> <code>sudo salt -C 'not G@roles:worker and not G@roles:webui' grains.get oscodename | grep -B1 'Leap 15.5'</code> is empty</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>read <a href="https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades" class="external">https://progress.opensuse.org/projects/openqav3/wiki#Distribution-upgrades</a></li>
<li>Reserve some time when the related services are not heavily relied upon</li>
<li>Keep IPMI interface ready and test that Serial-over-LAN works for potential recovery or for virtual machines virt-manager access</li>
<li>After upgrade reboot and check everything working as expected, if not rollback, e.g. with <code>snapper rollback</code></li>
</ul>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Remove silence <code>alertname=Failed systemd services</code></li>
</ul>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<ul>
<li>Don't worry, everything can be repaired :) If by any chance the machines gets misconfigured in many cases there are btrfs snapshots to recover, the IPMI Serial-over-LAN, etc.</li>
</ul>