openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-28T00:56:46ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #158185 (Feedback): parallel job failed to get the vars from its p...https://progress.opensuse.org/issues/1581852024-03-28T00:56:46ZJulie_CAOjcao@suse.com
<a name="Observation"></a>
<h3 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h3>
<p>We have a parallel job which failed in getting the vars from its pair. Rerun still failed. Is there something wrong with the worker service?</p>
<pre><code>sub get_var_from_parent {
my ($self, $var) = @_;
my $parents = get_parents();
#Query every parent to find the var
for my $job_id (@$parents) {
my $ref = get_job_autoinst_vars($job_id);
return $ref->{$var} if defined $ref->{$var};
}
return;
}
</code></pre>
<p><a href="https://openqa.suse.de/tests/13885165/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/13885165/logfile?filename=autoinst-log.txt</a></p>
<pre><code>[2024-03-27T15:39:25.691962Z] [debug] [pid:4639] get_job_autoinst_vars: Connection error: Can't connect: Name or service not known; URL was http://worker35:20493/wS5wkxkWNNB9LK92/vars
</code></pre> openQA Infrastructure - action #158113 (Feedback): typing issue on ppc64 worker - make CPU load a...https://progress.opensuse.org/issues/1581132024-03-27T08:03:58Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker size:S (Feedback)" href="https://progress.opensuse.org/issues/158104">#158104</a> shows VNC typing issues. For this in <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: CPU Load and usage alert for openQA workers size:S (Resolved)" href="https://progress.opensuse.org/issues/150983">#150983</a> on purpose we added alerts to alert on too high CPU load. <a href="https://monitor.qa.suse.de/d/WDmania/worker-dashboard-mania?orgId=1&from=now-2d&to=now&viewPanel=54694" class="external">https://monitor.qa.suse.de/d/WDmania/worker-dashboard-mania?orgId=1&from=now-2d&to=now&viewPanel=54694</a> clearly shows a load consistently in the range of 50-70(!) for mania but no alert triggered. We should crosscheck <a href="https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert" class="external">https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert</a><br>
and make that alert more strict.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> CPU load alerts trigger for a CPU load15 consistently above 40 as originally planned</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Crosscheck <a href="https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert" class="external">https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert</a> or the implementation in code <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blame/master/monitoring/grafana/alerting-dashboard-WD.yaml.template?ref_type=heads#L941" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blame/master/monitoring/grafana/alerting-dashboard-WD.yaml.template?ref_type=heads#L941</a></li>
</ul>
openQA Infrastructure - action #158104 (Feedback): typing issue on ppc64 worker size:Shttps://progress.opensuse.org/issues/1581042024-03-27T06:57:56Zzcjiazcjia@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-SP6-Online-ppc64le-ha_beta_supportserver@ppc64le-2g fails in<br>
<a href="https://openqa.suse.de/tests/13885455/modules/setup/steps/84" class="external">setup</a></p>
<p><a href="https://openqa.suse.de/tests/13885455#step/setup/84" class="external">https://openqa.suse.de/tests/13885455#step/setup/84</a> (see attachment p1.png)</p>
<p><a href="https://openqa.suse.de/tests/13885471#step/setup/30" class="external">https://openqa.suse.de/tests/13885471#step/setup/30</a> (see attachment p2.png) It missed "$" before "?".</p>
<p><a href="https://openqa.suse.de/tests/13885404#step/setup/12" class="external">https://openqa.suse.de/tests/13885404#step/setup/12</a> (see attachment p3.png)</p>
<p><a href="https://openqa.suse.de/tests/13885407#step/setup/9" class="external">https://openqa.suse.de/tests/13885407#step/setup/9</a> (see attachment p4.png)</p>
<p>I think this may related with the high work load of underlying ppc64 worker.</p>
<p>All on "mania"</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>The base test suite is used for job templates defined in YAML documents. It has no settings of its own.</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/13885455" class="external">73.1</a> (current job)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/13829359" class="external">67.1</a> (or more recent)</p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Identify the affected machines and workers, apply mitigations to prevent recurring typing issues, e.g. reducing CPU load</li>
<li>Restart related failed jobs</li>
<li>Identify follow-up tasks</li>
<li>Reduce the number of worker instances as a first mitigation measure. <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/759" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/759</a> (merged)</li>
<li>Make the alert for CPU load more strict - <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker - make CPU load alert more strict (Feedback)" href="https://progress.opensuse.org/issues/158113">#158113</a></li>
<li>Evaluate the impact on video encoding in particular on ppc64le, maybe ffmpeg on Power8 kvm is inefficient - <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) (New)" href="https://progress.opensuse.org/issues/158116">#158116</a></li>
<li>Check existing ffmpeg processes on mania which take a lot of CPU time - <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) (New)" href="https://progress.opensuse.org/issues/158116">#158116</a></li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>ffmpeg impact investigation -> <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker - make CPU load alert more strict (Feedback)" href="https://progress.opensuse.org/issues/158113">#158113</a></li>
<li>code improvements -> <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: typing issue on ppc64 worker - only pick up (or start) new jobs if CPU load is below configured t... (New)" href="https://progress.opensuse.org/issues/158125">#158125</a></li>
<li>improving the alert -> <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker - make CPU load alert more strict (Feedback)" href="https://progress.opensuse.org/issues/158113">#158113</a></li>
</ul>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online&machine=ppc64le-2g&test=ha_beta_supportserver&version=15-SP6" class="external">latest</a></p>
openQA Infrastructure - action #157615 (Feedback): [alert] osd-deployment failed in post-deploy ,...https://progress.opensuse.org/issues/1576152024-03-20T18:18:05Zjbaier_czjbaier@suse.cz
<p>See <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2411217" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2411217</a></p>
<pre><code>schort-server.qe.nue2.suse.org:
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked --exclude ""':
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed --exclude ""':
2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
monitor.qe.nue2.suse.org:
2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert 'https://monitor.qa.suse.de:443': dial tcp: lookup monitor.qa.suse.de: i/o timeout
2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors
telegraf errors
++ grep ' E! ' salt_post_deploy_checks.log
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked --exclude ""':
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed --exclude ""':
2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors
2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert 'https://monitor.qa.suse.de:443': dial tcp: lookup monitor.qa.suse.de: i/o timeout
2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ol>
<li>Understand why and where <code>systemd_list_service_by_state_for_telegraf.sh</code> times out. It could be the general telegraf-timeout in the pipeline, in the execution of the script itself (from telegraf.conf) or another place. Adjust the timeout to match expected runtime or fix the script to complete faster -> schort-server only has 1 VM core, consider configuring the hypervisor to use at least 2 cores</li>
<li>"Error killing process: os: process already finished" might just be a consequence of the above</li>
<li>"Error in plugin: cannot get SSL cert '<a href="https://monitor.qa.suse.de:443':" class="external">https://monitor.qa.suse.de:443':</a> dial tcp: lookup monitor.qa.suse.de: i/o timeout" possibly to be covered with some retrying? Investigate what the real error message means, ask <a href="https://www.ecosia.org/chat" class="external">https://www.ecosia.org/chat</a> (or if that does not work invest in coal-powered <a href="https://www.cat-gpt.com/chat" class="external">https://www.cat-gpt.com/chat</a> ) or something</li>
<li>If we cannot solve these problems, consider excluding them from CI execution to avoid false-positives. Consider the impact of doing this first however!</li>
</ol>
openQA Project - action #157540 (Feedback): [sporadic] ci openQA: t/33-developer_mode.t fails size:Mhttps://progress.opensuse.org/issues/1575402024-03-19T14:15:50Ztinitatina.mueller+trick-redmine@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://app.circleci.com/pipelines/github/os-autoinst/openQA/13196/workflows/ddb935c7-31dd-4beb-877c-25ef1e703b4d/jobs/123231" class="external">https://app.circleci.com/pipelines/github/os-autoinst/openQA/13196/workflows/ddb935c7-31dd-4beb-877c-25ef1e703b4d/jobs/123231</a></p>
<pre><code>[14:03:42] t/33-developer_mode.t .. 17/? # Unexpected Javascript console errors, waiting for connection opened: [
# {
# level => "SEVERE",
# message => "http://localhost:9526/asset/3906633cf0/ws_console.js 8 WebSocket connection to 'ws://localhost:9528/liveviewhandler/tests/1/developer/ws-proxy' failed: Error during WebSocket handshake: Unexpected response code: 302",
# source => "network",
# timestamp => 1710857067816,
# },
# ]
# Failed test 'No unexpected js warnings'
# at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 123.
# Looks like you failed 1 test of 9.
[14:03:42] t/33-developer_mode.t .. 20/?
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>While investigating the code in parallel try to reproduce locally with coverage enabled and multiple runs to get a statistically significant result, e.g. <code>make test KEEP_DB=1 RETRY=500 TESTS=t/33-developer.t</code> and go for lunch or continue coding :)</li>
<li>If it's not reproducible consider the same with coverage enabled and/or in circleCI, e.g. a temporary branch in your github repo fork</li>
<li>Identify where in <a href="https://github.com/os-autoinst/openQA/blob/master/t/33-developer_mode.t" class="external">https://github.com/os-autoinst/openQA/blob/master/t/33-developer_mode.t</a> the redirection "302" could happen</li>
<li>Even though the test is not technically a UI test in the t/ui/ folder it might still be necessary to apply UI test related synchronisation means to fix the sporadic failure as a selenium instance is used</li>
<li>Might be a similar issue: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [sporadic] t/full-stack.t Failed test 'Expected result for job 1 not found' size:M (Resolved)" href="https://progress.opensuse.org/issues/102578">#102578</a></li>
</ul>
openQA Project - coordination #157537 (Blocked): [epic] Secure setup of openQA test machines with...https://progress.opensuse.org/issues/1575372024-03-19T14:15:29Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In <a href="https://sd.suse.com/servicedesk/customer/portal/1/SD-150437" class="external">https://sd.suse.com/servicedesk/customer/portal/1/SD-150437</a> we are asked to handle "compromised root passwords in QA segments" including s390zl11…16 . We should secure our network and password handling better.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No openQA machine test machines directly accessible by SUSE users use ssh root with publically known passwords</li>
</ul>
<a name="Ideas"></a>
<h2 >Ideas<a href="#Ideas" class="wiki-anchor">¶</a></h2>
<ol>
<li>Be able to set a different password valid for tests, in particular s390kvm…, e.g. be able to set password by test variable and follow through in the complete test platform -> <a class="issue tracker-4 status-12 priority-5 priority-high3 child" title="action: [spike][timeboxed:10h] Use a different ssh root password for s390x kvm installation openQA jobs (... (Workable)" href="https://progress.opensuse.org/issues/157555">#157555</a></li>
<li>Key based authentication -> <a class="issue tracker-4 status-15 priority-4 priority-default child" title="action: [spike][timeboxed:10h] Use ssh key authentication in particular for s390x kvm installation openQA... (Blocked)" href="https://progress.opensuse.org/issues/157744">#157744</a></li>
<li>Rotating, automatic passwords saved as test variables connected to images, e.g. to be able to use a pre-installed image</li>
<li>Better secure the networks to have s390kvm… (and others) less accessible -> We have stated the requirement in <a href="https://confluence.suse.com/pages/viewpage.action?pageId=1006108843" class="external">https://confluence.suse.com/pages/viewpage.action?pageId=1006108843</a> that ssh 22/tcp needs to be reachable. We could try to replicate the setup we know from o3 to give OSD a second network interface which allows ssh 22/tcp and block ssh 22/tcp on .oqa.prg2.suse.org as usually we don't need ssh to workers, just from within the oqa network as well as for administrative purposes for which we could go over OSD which we also already normally do for salt. -> <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: Better secure the networks to have s390kvm… (and others) less accessible (New)" href="https://progress.opensuse.org/issues/157750">#157750</a></li>
</ol>
openQA Project - action #157369 (Feedback): Handle all node dependabot updates, not just security...https://progress.opensuse.org/issues/1573692024-03-15T21:04:40Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>With #155410 resolved we have dependabot updates in <a href="https://github.com/os-autoinst/openQA/" class="external">https://github.com/os-autoinst/openQA/</a>, actually already for all node updates, not just security updates. But we need to help dependabot on getting the updates done, e.g. update our code and tests so that they cope with a newer version. For trivial cases we already have dependabot creating the pull request and mergify eventually merging it after a wait time of multiple days. For the cases where CI tests fail we need people to push code changes. Maybe just mention it on <a href="https://progress.opensuse.org/projects/qa/wiki/tools" class="external">https://progress.opensuse.org/projects/qa/wiki/tools</a> that we should support such pull requests, set aside work time to support those updates and in cases where it's becoming too much effort just create an according ticket for each pull request that needs more work.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> The team is confident how to handle dependabot updates as part of their daily work</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add on an appropriate place on progress.opensuse.org/projects/qa/wiki/tools how to handle such updates</li>
<li>Tell everyone from the team, ask them for feedback, adjust</li>
</ul>
openQA Project - action #157147 (Blocked): Documentation for OSD worker region, location, datacen...https://progress.opensuse.org/issues/1571472024-03-13T09:47:18Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Based on request by fniederwanger in Slack.<br>
With <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/705" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/705</a> we have introduced region+location+datacenter keys in the worker class settings. That is generally documented as example in <a href="http://open.qa/docs/#_assigning_jobs_to_workers" class="external">http://open.qa/docs/#_assigning_jobs_to_workers</a> but not directly describes the relevant settings for OSD users. We should ensure that this concept is described with users of the OSD infrastructure in mind, e.g. in <a href="https://wiki.suse.net/index.php/OpenQA" class="external">https://wiki.suse.net/index.php/OpenQA</a> and/or <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md</a> and/or <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md?ref_type=heads" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md?ref_type=heads</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> <a href="http://open.qa/docs/#_assigning_jobs_to_workers" class="external">http://open.qa/docs/#_assigning_jobs_to_workers</a> suggests the specific concept of "region-…,datacenter-…,location-…"</li>
<li><strong>AC2:</strong> A SUSE specific documentation explains the meaning of "region, datacenter, location" used within <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Simply change the lower text in <a href="http://open.qa/docs/#_assigning_jobs_to_workers" class="external">http://open.qa/docs/#_assigning_jobs_to_workers</a> with the specific concept of "region-…,datacenter-…,location-…"</li>
<li>Add a SUSE specific documentation for the meaning of "region, datacenter, location" used within <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a>, possibly in <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md</a> itself, maybe referencing that repo in detail on <a href="https://wiki.suse.net/index.php/OpenQA" class="external">https://wiki.suse.net/index.php/OpenQA</a> and/or <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md</a></li>
</ul>
openQA Project - action #156553 (Blocked): [timeboxed:10h][spike solution] openQA webUI search vi...https://progress.opensuse.org/issues/1565532024-03-04T11:07:42Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>From #121246-15: "We'd need to look for all the tests that are failing for a given incident, using the same TEST_ISSUES for both, Aggregates and Incidents". So what is needed is a single command line or openQA webUI search view to show all tests blocking an incident by squad. After <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does ... (Resolved)" href="https://progress.opensuse.org/issues/117655">#117655</a> and <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S (Resolved)" href="https://progress.opensuse.org/issues/119746">#119746</a> and <a class="issue tracker-4 status-12 priority-4 priority-default child" title="action: A single API route to show all not-ok tests blocking a SLE maintenance incident size:M (Workable)" href="https://progress.opensuse.org/issues/156547">#156547</a> we should combine both.</p>
<a name="Goals"></a>
<h2 >Goals<a href="#Goals" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>G1:</strong> Proof-of-concept for an openQA webUI search view to show all tests blocking an incident by squad, e.g. based on special job setting or group glob</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>We have support for group globbing (<a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Filter openQA todo-jobs on /tests belonging to one "review squad" size:M (Resolved)" href="https://progress.opensuse.org/issues/134933#note-32">#134933#note-32</a>)
<ul>
<li><a href="https://openqa.opensuse.org/tests?group_glob=*Leap*&todo=1" class="external">https://openqa.opensuse.org/tests?group_glob=*Leap*&todo=1</a></li>
</ul></li>
<li>"squads" could be mapped into openQA for example with special job settings, e.g. QE Core ensures to trigger all their tests with _SQUAD='QE Core' and then be able to filter by that</li>
<li>This doesn't need to be specific to squads/blocking tests (openQA itself should not know about these SUSE specific concepts)</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>We don't care if searching for job settings is limited by an artificial search depth or super slow -> <a class="issue tracker-4 status-12 priority-4 priority-default child" title="action: A single API route to show all not-ok tests blocking a SLE maintenance incident size:M (Workable)" href="https://progress.opensuse.org/issues/156547">#156547</a></li>
</ul>
QA - action #153733 (Feedback): Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberryhttps://progress.opensuse.org/issues/1537332024-01-16T20:12:28Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> soapberry is usable from PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a href="https://jira.suse.com/browse/ENGINFRA-3748" class="external">https://jira.suse.com/browse/ENGINFRA-3748</a></li>
<li>Ensure machine can be reached</li>
<li>Ensure machine is used as in before migration</li>
</ul>
QA - action #153724 (Feedback): Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - blackcur...https://progress.opensuse.org/issues/1537242024-01-16T20:07:13Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> blackcurrant is usable from PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a href="https://jira.suse.com/browse/ENGINFRA-3745" class="external">https://jira.suse.com/browse/ENGINFRA-3745</a></li>
<li>Ensure machine can be reached</li>
<li>Ensure machine is used as in before migration</li>
</ul>
QA - action #153718 (Feedback): Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldirhttps://progress.opensuse.org/issues/1537182024-01-16T20:02:28Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> haldir is usable from PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a href="https://jira.suse.com/browse/ENGINFRA-3744" class="external">https://jira.suse.com/browse/ENGINFRA-3744</a></li>
<li>Ensure machine can be reached</li>
<li>Ensure machine is used as in before migration</li>
</ul>
openQA Project - coordination #152847 (Blocked): [epic] version control awareness within openQA f...https://progress.opensuse.org/issues/1528472023-12-21T12:48:46Zokurzokurz@suse.comopenQA Project - action #130943 (Feedback): Test parameterization for github description/comments...https://progress.opensuse.org/issues/1309432023-06-15T10:23:03Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>See epic <a class="issue tracker-6 status-15 priority-4 priority-default child parent" title="coordination: [epic] Use openqa-clone-custom-git-refspec to parse github description+comments and trigger openQ... (Blocked)" href="https://progress.opensuse.org/issues/130850">#130850</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> One is able to customize test scheduling in github description/comments with additional test parameters, e.g. <code>openqa: http://my/openqa/t1 FOO=bar</code> or <code>openqa: BAR=eggs</code></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Wait for <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Trigger openQA tests mentioned in github description as part of CI size:M (Resolved)" href="https://progress.opensuse.org/issues/130934">#130934</a> and <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: Trigger openQA tests mentioned in github comments as part of automatic testing as well (New)" href="https://progress.opensuse.org/issues/130940">#130940</a></li>
<li>See <a href="https://github.com/os-autoinst/scripts/pull/292" class="external">https://github.com/os-autoinst/scripts/pull/292</a> how it was done for cloning script based on github description</li>
<li>Additional test parameters specified are applied same as users are used to with openqa-clone-job - we don't expect to use the script but e.g. CloneJob.pm or refactor code as needed to support that</li>
<li>Extend documentation to cover that</li>
</ul>
QA - coordination #129280 (Blocked): [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacentershttps://progress.opensuse.org/issues/1292802023-05-15T07:12:10Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>SUSE NUE1 is being evacuated so we need to ensure our services are provided from other places and that NUE1 has been evacuated by us.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> NUE1 (Maxtorhof) is not relied upon by SUSE QE Tools anymore and has been evacuated by us</li>
</ul>
<a name="Ideas"></a>
<h2 >Ideas<a href="#Ideas" class="wiki-anchor">¶</a></h2>
<ul>
<li>"To-be-decommissioned" machines obviously should not be moved to a new datacenter</li>
<li>Consider decommissioning some more machines in the process, e.g. "qanet" which should be replaced by Eng-Infra maintained DHCP+DNS same as we have in PRG1, PRG2, NUE2 (e.g. FC Basement) and also qanet does not have proper remote management capabilities</li>
<li>Some machines might be better moved to FC Basement rather than new NBG Datacenter</li>
</ul>