openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-28T00:56:46ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #158185 (Feedback): parallel job failed to get the vars from its p...https://progress.opensuse.org/issues/1581852024-03-28T00:56:46ZJulie_CAOjcao@suse.com
<a name="Observation"></a>
<h3 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h3>
<p>We have a parallel job which failed in getting the vars from its pair. Rerun still failed. Is there something wrong with the worker service?</p>
<pre><code>sub get_var_from_parent {
my ($self, $var) = @_;
my $parents = get_parents();
#Query every parent to find the var
for my $job_id (@$parents) {
my $ref = get_job_autoinst_vars($job_id);
return $ref->{$var} if defined $ref->{$var};
}
return;
}
</code></pre>
<p><a href="https://openqa.suse.de/tests/13885165/logfile?filename=autoinst-log.txt" class="external">https://openqa.suse.de/tests/13885165/logfile?filename=autoinst-log.txt</a></p>
<pre><code>[2024-03-27T15:39:25.691962Z] [debug] [pid:4639] get_job_autoinst_vars: Connection error: Can't connect: Name or service not known; URL was http://worker35:20493/wS5wkxkWNNB9LK92/vars
</code></pre> openQA Infrastructure - action #158113 (Feedback): typing issue on ppc64 worker - make CPU load a...https://progress.opensuse.org/issues/1581132024-03-27T08:03:58Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p><a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker size:S (Feedback)" href="https://progress.opensuse.org/issues/158104">#158104</a> shows VNC typing issues. For this in <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: CPU Load and usage alert for openQA workers size:S (Resolved)" href="https://progress.opensuse.org/issues/150983">#150983</a> on purpose we added alerts to alert on too high CPU load. <a href="https://monitor.qa.suse.de/d/WDmania/worker-dashboard-mania?orgId=1&from=now-2d&to=now&viewPanel=54694" class="external">https://monitor.qa.suse.de/d/WDmania/worker-dashboard-mania?orgId=1&from=now-2d&to=now&viewPanel=54694</a> clearly shows a load consistently in the range of 50-70(!) for mania but no alert triggered. We should crosscheck <a href="https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert" class="external">https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert</a><br>
and make that alert more strict.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> CPU load alerts trigger for a CPU load15 consistently above 40 as originally planned</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Crosscheck <a href="https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert" class="external">https://monitor.qa.suse.de/alerting/cpu_load_alert_mania/modify-export?returnTo=%2Fd%2FWDmania%2Fworker-dashboard-mania%3ForgId%3D1%26from%3Dnow-7d%26to%3Dnow%26viewPanel%3D54694%26editPanel%3D54694%26tab%3Dalert</a> or the implementation in code <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blame/master/monitoring/grafana/alerting-dashboard-WD.yaml.template?ref_type=heads#L941" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blame/master/monitoring/grafana/alerting-dashboard-WD.yaml.template?ref_type=heads#L941</a></li>
</ul>
openQA Infrastructure - action #158104 (Feedback): typing issue on ppc64 worker size:Shttps://progress.opensuse.org/issues/1581042024-03-27T06:57:56Zzcjiazcjia@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>openQA test in scenario sle-15-SP6-Online-ppc64le-ha_beta_supportserver@ppc64le-2g fails in<br>
<a href="https://openqa.suse.de/tests/13885455/modules/setup/steps/84" class="external">setup</a></p>
<p><a href="https://openqa.suse.de/tests/13885455#step/setup/84" class="external">https://openqa.suse.de/tests/13885455#step/setup/84</a> (see attachment p1.png)</p>
<p><a href="https://openqa.suse.de/tests/13885471#step/setup/30" class="external">https://openqa.suse.de/tests/13885471#step/setup/30</a> (see attachment p2.png) It missed "$" before "?".</p>
<p><a href="https://openqa.suse.de/tests/13885404#step/setup/12" class="external">https://openqa.suse.de/tests/13885404#step/setup/12</a> (see attachment p3.png)</p>
<p><a href="https://openqa.suse.de/tests/13885407#step/setup/9" class="external">https://openqa.suse.de/tests/13885407#step/setup/9</a> (see attachment p4.png)</p>
<p>I think this may related with the high work load of underlying ppc64 worker.</p>
<p>All on "mania"</p>
<a name="Test-suite-description"></a>
<h2 >Test suite description<a href="#Test-suite-description" class="wiki-anchor">¶</a></h2>
<p>The base test suite is used for job templates defined in YAML documents. It has no settings of its own.</p>
<a name="Reproducible"></a>
<h2 >Reproducible<a href="#Reproducible" class="wiki-anchor">¶</a></h2>
<p>Fails since (at least) Build <a href="https://openqa.suse.de/tests/13885455" class="external">73.1</a> (current job)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>Last good: <a href="https://openqa.suse.de/tests/13829359" class="external">67.1</a> (or more recent)</p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Identify the affected machines and workers, apply mitigations to prevent recurring typing issues, e.g. reducing CPU load</li>
<li>Restart related failed jobs</li>
<li>Identify follow-up tasks</li>
<li>Reduce the number of worker instances as a first mitigation measure. <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/759" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/759</a> (merged)</li>
<li>Make the alert for CPU load more strict - <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker - make CPU load alert more strict (Feedback)" href="https://progress.opensuse.org/issues/158113">#158113</a></li>
<li>Evaluate the impact on video encoding in particular on ppc64le, maybe ffmpeg on Power8 kvm is inefficient - <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) (New)" href="https://progress.opensuse.org/issues/158116">#158116</a></li>
<li>Check existing ffmpeg processes on mania which take a lot of CPU time - <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: typing issue on ppc64 worker - crosscheck performance impact of ffmpeg on ppc64le (Power8 kvm) (New)" href="https://progress.opensuse.org/issues/158116">#158116</a></li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>ffmpeg impact investigation -> <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker - make CPU load alert more strict (Feedback)" href="https://progress.opensuse.org/issues/158113">#158113</a></li>
<li>code improvements -> <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: typing issue on ppc64 worker - only pick up (or start) new jobs if CPU load is below configured t... (New)" href="https://progress.opensuse.org/issues/158125">#158125</a></li>
<li>improving the alert -> <a class="issue tracker-4 status-4 priority-5 priority-high3 child behind-schedule" title="action: typing issue on ppc64 worker - make CPU load alert more strict (Feedback)" href="https://progress.opensuse.org/issues/158113">#158113</a></li>
</ul>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Always latest result in this scenario: <a href="https://openqa.suse.de/tests/latest?arch=ppc64le&distri=sle&flavor=Online&machine=ppc64le-2g&test=ha_beta_supportserver&version=15-SP6" class="external">latest</a></p>
openQA Infrastructure - action #157615 (Feedback): [alert] osd-deployment failed in post-deploy ,...https://progress.opensuse.org/issues/1576152024-03-20T18:18:05Zjbaier_czjbaier@suse.cz
<p>See <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2411217" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2411217</a></p>
<pre><code>schort-server.qe.nue2.suse.org:
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked --exclude ""':
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed --exclude ""':
2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
monitor.qe.nue2.suse.org:
2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert 'https://monitor.qa.suse.de:443': dial tcp: lookup monitor.qa.suse.de: i/o timeout
2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors
telegraf errors
++ grep ' E! ' salt_post_deploy_checks.log
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked --exclude ""':
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed --exclude ""':
2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors
2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert 'https://monitor.qa.suse.de:443': dial tcp: lookup monitor.qa.suse.de: i/o timeout
2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ol>
<li>Understand why and where <code>systemd_list_service_by_state_for_telegraf.sh</code> times out. It could be the general telegraf-timeout in the pipeline, in the execution of the script itself (from telegraf.conf) or another place. Adjust the timeout to match expected runtime or fix the script to complete faster -> schort-server only has 1 VM core, consider configuring the hypervisor to use at least 2 cores</li>
<li>"Error killing process: os: process already finished" might just be a consequence of the above</li>
<li>"Error in plugin: cannot get SSL cert '<a href="https://monitor.qa.suse.de:443':" class="external">https://monitor.qa.suse.de:443':</a> dial tcp: lookup monitor.qa.suse.de: i/o timeout" possibly to be covered with some retrying? Investigate what the real error message means, ask <a href="https://www.ecosia.org/chat" class="external">https://www.ecosia.org/chat</a> (or if that does not work invest in coal-powered <a href="https://www.cat-gpt.com/chat" class="external">https://www.cat-gpt.com/chat</a> ) or something</li>
<li>If we cannot solve these problems, consider excluding them from CI execution to avoid false-positives. Consider the impact of doing this first however!</li>
</ol>
openQA Project - action #157540 (Feedback): [sporadic] ci openQA: t/33-developer_mode.t fails size:Mhttps://progress.opensuse.org/issues/1575402024-03-19T14:15:50Ztinitatina.mueller+trick-redmine@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://app.circleci.com/pipelines/github/os-autoinst/openQA/13196/workflows/ddb935c7-31dd-4beb-877c-25ef1e703b4d/jobs/123231" class="external">https://app.circleci.com/pipelines/github/os-autoinst/openQA/13196/workflows/ddb935c7-31dd-4beb-877c-25ef1e703b4d/jobs/123231</a></p>
<pre><code>[14:03:42] t/33-developer_mode.t .. 17/? # Unexpected Javascript console errors, waiting for connection opened: [
# {
# level => "SEVERE",
# message => "http://localhost:9526/asset/3906633cf0/ws_console.js 8 WebSocket connection to 'ws://localhost:9528/liveviewhandler/tests/1/developer/ws-proxy' failed: Error during WebSocket handshake: Unexpected response code: 302",
# source => "network",
# timestamp => 1710857067816,
# },
# ]
# Failed test 'No unexpected js warnings'
# at /home/squamata/project/t/lib/OpenQA/Test/FullstackUtils.pm line 123.
# Looks like you failed 1 test of 9.
[14:03:42] t/33-developer_mode.t .. 20/?
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>While investigating the code in parallel try to reproduce locally with coverage enabled and multiple runs to get a statistically significant result, e.g. <code>make test KEEP_DB=1 RETRY=500 TESTS=t/33-developer.t</code> and go for lunch or continue coding :)</li>
<li>If it's not reproducible consider the same with coverage enabled and/or in circleCI, e.g. a temporary branch in your github repo fork</li>
<li>Identify where in <a href="https://github.com/os-autoinst/openQA/blob/master/t/33-developer_mode.t" class="external">https://github.com/os-autoinst/openQA/blob/master/t/33-developer_mode.t</a> the redirection "302" could happen</li>
<li>Even though the test is not technically a UI test in the t/ui/ folder it might still be necessary to apply UI test related synchronisation means to fix the sporadic failure as a selenium instance is used</li>
<li>Might be a similar issue: <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: [sporadic] t/full-stack.t Failed test 'Expected result for job 1 not found' size:M (Resolved)" href="https://progress.opensuse.org/issues/102578">#102578</a></li>
</ul>
openQA Project - action #157369 (Feedback): Handle all node dependabot updates, not just security...https://progress.opensuse.org/issues/1573692024-03-15T21:04:40Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>With #155410 resolved we have dependabot updates in <a href="https://github.com/os-autoinst/openQA/" class="external">https://github.com/os-autoinst/openQA/</a>, actually already for all node updates, not just security updates. But we need to help dependabot on getting the updates done, e.g. update our code and tests so that they cope with a newer version. For trivial cases we already have dependabot creating the pull request and mergify eventually merging it after a wait time of multiple days. For the cases where CI tests fail we need people to push code changes. Maybe just mention it on <a href="https://progress.opensuse.org/projects/qa/wiki/tools" class="external">https://progress.opensuse.org/projects/qa/wiki/tools</a> that we should support such pull requests, set aside work time to support those updates and in cases where it's becoming too much effort just create an according ticket for each pull request that needs more work.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> The team is confident how to handle dependabot updates as part of their daily work</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add on an appropriate place on progress.opensuse.org/projects/qa/wiki/tools how to handle such updates</li>
<li>Tell everyone from the team, ask them for feedback, adjust</li>
</ul>
openQA Project - action #157147 (Blocked): Documentation for OSD worker region, location, datacen...https://progress.opensuse.org/issues/1571472024-03-13T09:47:18Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>Based on request by fniederwanger in Slack.<br>
With <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/705" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/merge_requests/705</a> we have introduced region+location+datacenter keys in the worker class settings. That is generally documented as example in <a href="http://open.qa/docs/#_assigning_jobs_to_workers" class="external">http://open.qa/docs/#_assigning_jobs_to_workers</a> but not directly describes the relevant settings for OSD users. We should ensure that this concept is described with users of the OSD infrastructure in mind, e.g. in <a href="https://wiki.suse.net/index.php/OpenQA" class="external">https://wiki.suse.net/index.php/OpenQA</a> and/or <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md</a> and/or <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md?ref_type=heads" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md?ref_type=heads</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> <a href="http://open.qa/docs/#_assigning_jobs_to_workers" class="external">http://open.qa/docs/#_assigning_jobs_to_workers</a> suggests the specific concept of "region-…,datacenter-…,location-…"</li>
<li><strong>AC2:</strong> A SUSE specific documentation explains the meaning of "region, datacenter, location" used within <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Simply change the lower text in <a href="http://open.qa/docs/#_assigning_jobs_to_workers" class="external">http://open.qa/docs/#_assigning_jobs_to_workers</a> with the specific concept of "region-…,datacenter-…,location-…"</li>
<li>Add a SUSE specific documentation for the meaning of "region, datacenter, location" used within <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/openqa/workerconf.sls</a>, possibly in <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/blob/master/README.md</a> itself, maybe referencing that repo in detail on <a href="https://wiki.suse.net/index.php/OpenQA" class="external">https://wiki.suse.net/index.php/OpenQA</a> and/or <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/README.md</a></li>
</ul>
openQA Project - action #156553 (Blocked): [timeboxed:10h][spike solution] openQA webUI search vi...https://progress.opensuse.org/issues/1565532024-03-04T11:07:42Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>From #121246-15: "We'd need to look for all the tests that are failing for a given incident, using the same TEST_ISSUES for both, Aggregates and Incidents". So what is needed is a single command line or openQA webUI search view to show all tests blocking an incident by squad. After <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does ... (Resolved)" href="https://progress.opensuse.org/issues/117655">#117655</a> and <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S (Resolved)" href="https://progress.opensuse.org/issues/119746">#119746</a> and <a class="issue tracker-4 status-12 priority-4 priority-default child" title="action: A single API route to show all not-ok tests blocking a SLE maintenance incident size:M (Workable)" href="https://progress.opensuse.org/issues/156547">#156547</a> we should combine both.</p>
<a name="Goals"></a>
<h2 >Goals<a href="#Goals" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>G1:</strong> Proof-of-concept for an openQA webUI search view to show all tests blocking an incident by squad, e.g. based on special job setting or group glob</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>We have support for group globbing (<a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Filter openQA todo-jobs on /tests belonging to one "review squad" size:M (Resolved)" href="https://progress.opensuse.org/issues/134933#note-32">#134933#note-32</a>)
<ul>
<li><a href="https://openqa.opensuse.org/tests?group_glob=*Leap*&todo=1" class="external">https://openqa.opensuse.org/tests?group_glob=*Leap*&todo=1</a></li>
</ul></li>
<li>"squads" could be mapped into openQA for example with special job settings, e.g. QE Core ensures to trigger all their tests with _SQUAD='QE Core' and then be able to filter by that</li>
<li>This doesn't need to be specific to squads/blocking tests (openQA itself should not know about these SUSE specific concepts)</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>We don't care if searching for job settings is limited by an artificial search depth or super slow -> <a class="issue tracker-4 status-12 priority-4 priority-default child" title="action: A single API route to show all not-ok tests blocking a SLE maintenance incident size:M (Workable)" href="https://progress.opensuse.org/issues/156547">#156547</a></li>
</ul>
QA - action #153733 (Feedback): Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - soapberryhttps://progress.opensuse.org/issues/1537332024-01-16T20:12:28Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> soapberry is usable from PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a href="https://jira.suse.com/browse/ENGINFRA-3748" class="external">https://jira.suse.com/browse/ENGINFRA-3748</a></li>
<li>Ensure machine can be reached</li>
<li>Ensure machine is used as in before migration</li>
</ul>
QA - action #153724 (Feedback): Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - blackcur...https://progress.opensuse.org/issues/1537242024-01-16T20:07:13Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> blackcurrant is usable from PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a href="https://jira.suse.com/browse/ENGINFRA-3745" class="external">https://jira.suse.com/browse/ENGINFRA-3745</a></li>
<li>Ensure machine can be reached</li>
<li>Ensure machine is used as in before migration</li>
</ul>
QA - action #153718 (Feedback): Move of LSG QE non-openQA PowerPC machine NUE1 to PRG2 - haldirhttps://progress.opensuse.org/issues/1537182024-01-16T20:02:28Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> haldir is usable from PRG2</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Follow <a href="https://jira.suse.com/browse/ENGINFRA-3744" class="external">https://jira.suse.com/browse/ENGINFRA-3744</a></li>
<li>Ensure machine can be reached</li>
<li>Ensure machine is used as in before migration</li>
</ul>
openQA Project - coordination #152847 (Blocked): [epic] version control awareness within openQA f...https://progress.opensuse.org/issues/1528472023-12-21T12:48:46Zokurzokurz@suse.comopenQA Project - action #130943 (Feedback): Test parameterization for github description/comments...https://progress.opensuse.org/issues/1309432023-06-15T10:23:03Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>See epic <a class="issue tracker-6 status-15 priority-4 priority-default child parent" title="coordination: [epic] Use openqa-clone-custom-git-refspec to parse github description+comments and trigger openQ... (Blocked)" href="https://progress.opensuse.org/issues/130850">#130850</a></p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> One is able to customize test scheduling in github description/comments with additional test parameters, e.g. <code>openqa: http://my/openqa/t1 FOO=bar</code> or <code>openqa: BAR=eggs</code></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Wait for <a class="issue tracker-4 status-3 priority-5 priority-high3 closed child" title="action: Trigger openQA tests mentioned in github description as part of CI size:M (Resolved)" href="https://progress.opensuse.org/issues/130934">#130934</a> and <a class="issue tracker-4 status-1 priority-4 priority-default child" title="action: Trigger openQA tests mentioned in github comments as part of automatic testing as well (New)" href="https://progress.opensuse.org/issues/130940">#130940</a></li>
<li>See <a href="https://github.com/os-autoinst/scripts/pull/292" class="external">https://github.com/os-autoinst/scripts/pull/292</a> how it was done for cloning script based on github description</li>
<li>Additional test parameters specified are applied same as users are used to with openqa-clone-job - we don't expect to use the script but e.g. CloneJob.pm or refactor code as needed to support that</li>
<li>Extend documentation to cover that</li>
</ul>
QA - coordination #121720 (Blocked): [saga][epic] Migration to QE setup in PRG2+NUE3 while ensuri...https://progress.opensuse.org/issues/1217202022-12-08T19:30:27Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>SUSE is deprecating NUE1 (Maxtorhof) and setting up a Prague Co-Location datacenter "Prg CoLo" or "DC7" as primary location in particular for serving public services. This includes what we serve so far from VM clusters managed by EngInfra and in particular the openqa.opensuse.org infrastructure, likely also openqa.suse.de. Or defined differently: Everything that is currently served from NUE1-SRV1. We must participate in planning and setup and accordingly a migration until we can provide our services from Prg CoLo and do not rely on NUE1-SRV1 anymore except for the purpose of an optional fail-over datacenter in Nbg.<br>
SUSE is deprecating NUE1 (Maxtorhof) and setting up replacement data centers. Additionally a new datacenter is planned as fail-over location</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> SUSE QE Tools services are provided out of Prg CoLo <a class="issue tracker-6 status-15 priority-4 priority-default child parent behind-schedule" title="coordination: [epic] Provide SUSE QE Tools services running in PRG2 aka. Prg CoLo (Blocked)" href="https://progress.opensuse.org/issues/123800">#123800</a></li>
<li><strong>AC2:</strong> NUE1 (Maxtorhof) is not relied upon by SUSE QE Tools anymore and has been evacuated by us <a class="issue tracker-6 status-15 priority-4 priority-default child parent behind-schedule" title="coordination: [epic] Move from SUSE NUE1 (Maxtorhof) to new NBG Datacenters (Blocked)" href="https://progress.opensuse.org/issues/129280">#129280</a></li>
<li><strong>AC3:</strong> Relevant SUSE QE Tools services are provided out of NUE3 <a class="issue tracker-6 status-3 priority-4 priority-default closed child parent" title="coordination: [epic] Migration out of SUSE NUE1 - QE setup in NUE3 (Resolved)" href="https://progress.opensuse.org/issues/130955">#130955</a></li>
</ul>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<p>Coordination chat room <a href="https://suse.slack.com/archives/C04MDKHQE20" class="external">#dct-migration</a></p>
openQA Project - coordination #58184 (Blocked): [saga][epic][use case] full version control aware...https://progress.opensuse.org/issues/581842019-10-15T10:19:57Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>This is linked to <a href="https://progress.opensuse.org/projects/openqav3/wiki#Use-case-4" class="external">Use case 4</a> and motivated by a discussion by the QA tools team in the weekly meeting 2019-10-15. What we should have are for example user forks and branches, fully versioned test schedules and configuration settings</p>
<a name="User-story"></a>
<h2 >User story<a href="#User-story" class="wiki-anchor">¶</a></h2>
<p>As a test case contributor during test case development I want to run tests on production instances with all necessary changes recorded in version control before merging to master so that my change will have minimal unexpected impact (test regressions) on existing tests</p>
<a name="Further-user-stories-from-httpsconfluencesusecompagesviewpageactionpageId365527173"></a>
<h2 >Further user stories (from <a href="https://confluence.suse.com/pages/viewpage.action?pageId=365527173" class="external">https://confluence.suse.com/pages/viewpage.action?pageId=365527173</a>)<a href="#Further-user-stories-from-httpsconfluencesusecompagesviewpageactionpageId365527173" class="wiki-anchor">¶</a></h2>
<ol>
<li>I want to start a job based on a modified test in production (In production tests can behave differently, for example because of the heavier load) -> see openqa-clone-job + CASEDIR</li>
<li>I want to edit needles and test if they work before proposing changes</li>
<li>I want to compare the results of a certain job group between two of my branches</li>
<li>I want to schedule a test 100 times without it showing up in the group overview -> see <a href="https://progress.opensuse.org/projects/openqatests/wiki#Statistical-investigation" class="external">statistical-investigation</a></li>
<li>I want to trigger multiple cloned jobs for each pull-request (Sometimes you want to trigger VR for different jobs against the same PR. it would be nice to do that in one command line)</li>
<li>I want to trigger the relevant tests automatically by creating a PR</li>
</ol>
<a name="Implications-and-suggestions"></a>
<h2 >Implications and suggestions<a href="#Implications-and-suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><p>The usual test contributor workflows should be supported and made easier by making openQA fully aware of tests triggered for development purposes without negatively impacting existing validation tests</p>
<ul>
<li>Potential impact on asset management</li>
<li>No pollution of validation test reports by development tests</li>
</ul></li>
<li><p>If there are new/modified needles involved, the existing workflow cannot handle that. The current practice is:</p>
<ul>
<li>Test your changes (and possibly needle changes) locally and create PR(s)</li>
<li>Edit needles online and save them (then they will be committed to master). Requires admin rights</li>
</ul></li>
<li><p>DONE: Cloning cancelled or incomplete jobs currently does not work as openqa-clone-custom-git-refspec requires the vars.json file from a completed job with this file uploaded -> <a href="https://github.com/os-autoinst/openQA/pull/3170" class="external">https://github.com/os-autoinst/openQA/pull/3170</a></p></li>
<li><p>Replace "fetchneedles" by inherent git support</p></li>
<li><p>Provide support for github pull request validation</p></li>
<li><p>DONE: Extend openqa-clone-custom-git-refspec to accept list of source tests to clone -> <a href="https://github.com/os-autoinst/openQA/pull/2577" class="external">https://github.com/os-autoinst/openQA/pull/2577</a></p></li>
<li><p>DONE: openqa-clone-custom-git-refspec: Output in markdown format for easy copy/pasting into git commit messages and github PR comments -> <a href="https://github.com/os-autoinst/openQA/pull/2577" class="external">https://github.com/os-autoinst/openQA/pull/2577</a></p></li>
<li><p>openqa-clone-custom-git-refspec: Provide link to /tests/overview page for the custom build when multiple tests have been cloned</p></li>
<li><p>Make the trigger source of test jobs apparent, e.g. the source git repositories</p></li>
<li><p><a class="issue tracker-6 status-3 priority-4 priority-default closed parent" title="coordination: [EPIC] Interactive mode is an usability disaster (Resolved)" href="https://progress.opensuse.org/issues/14818#note-18">#14818#note-18</a> : "Tim got a ticket from Ray that the docker test failed and wants openQA to reproduce the issue and pause at the beginning of the docker test. Afterwards he wants openQA to make a disk snapshot and step through the test execution to find out where the problem is. After he found out, he reloads the snapshot to tweak the execution. During this process, openQA records his steps and allows to add needles."</p></li>
</ul>