openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-02-21T11:58:03ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #155740 (Resolved): Scripts CI pipelines fail due to timeout after...https://progress.opensuse.org/issues/1557402024-02-21T11:58:03Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2298958" class="external">https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2298958</a></p>
<pre><code>{"count":2,"failed":[],"ids":[13560656,13560657],"scheduled_product_id":2058111}
2 jobs have been created:
- http://openqa.suse.de/tests/13560656
- http://openqa.suse.de/tests/13560657
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
Job state of job ID 13560656: scheduled, waiting …
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
[...]
Job state of job ID 13560656: scheduled, waiting …
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
Jo
ERROR: Job failed: execution took longer than 1h0m0s seconds
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Project - action #153769 (Resolved): Better handle changes in GRE tunnel configuration size:Mhttps://progress.opensuse.org/issues/1537692024-01-17T11:44:11Zmkittlermarius.kittler@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>When changing the GRE tunnel configuration (<code>/etc/wicked/scripts/gre_tunnel_preup.sh</code>) by changing related <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/openvswitch.sls" class="external">salt states</a> or <code>workerconf.sls</code> in pillars these changes are not applied automatically unlike worker settings. This can lead to openQA test failures due to inconsistencies as well as potentially incomplete routing due to STP selections.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> We are able to change the GRE tunnel configuration on any salt-controlled openQA worker without causing openQA test failures</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Run <code>ovs-appctl stp/show</code> like on all workers to see how it currently routes packages</li>
<li>In the best case our salt states handle this automatically. It would be possible to simply re-run <code>/etc/wicked/scripts/gre_tunnel_preup.sh</code> after it has changed.
<ul>
<li>Adding/removing ports will cause a temporary unavailability of the network and thus disrupt tests.</li>
<li>Stop the services, re-run the script and finally start the services again?</li>
<li>If necessary reboot the host (not sure how easy this is to trigger from salt states).</li>
</ul></li>
<li>In the worst case we make sure the limitation is properly documented with instructions to follow (e.g. command to reboot all workers).</li>
<li>So simply try out to rerun /etc/wicked/scripts/gre_tunnel_preup.sh in salt after it has changed and monitor for bad consequences</li>
<li>Monitor <a href="https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=24" class="external">https://monitor.qa.suse.de/d/nRDab3Jiz/openqa-jobs-test?orgId=1&viewPanel=24</a></li>
<li>If nothing bad happened then assume we are done, else try to trigger reboots</li>
</ul>
<a name="Further-details"></a>
<h2 >Further details<a href="#Further-details" class="wiki-anchor">¶</a></h2>
<ul>
<li>Checkout <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: significant increase in MM-test failure ratio 2023-12-11: test fails in multipath_iscsi and other... (Resolved)" href="https://progress.opensuse.org/issues/152389#note-63">#152389#note-63</a> and subsequent comments for further context.</li>
</ul>
openQA Infrastructure - action #137984 (Resolved): salt "refresh" job full of errors but CI job p...https://progress.opensuse.org/issues/1379842023-10-13T18:04:54Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1900948">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1900948</a> shows a lot of errors, e.g.</p>
<pre><code>s390zl13.oqa.prg2.suse.org:
----------
mine.update:
True
saltutil.refresh_grains:
True
saltutil.refresh_pillar:
True
saltutil.sync_grains:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/modules/saltutil.py", line 79, in _get_top_file_envs
return __context__["saltutil._top_file_envs"]
File "/usr/lib/python3.6/site-packages/salt/loader/context.py", line 78, in __getitem__
return self.value()[item]
KeyError: 'saltutil._top_file_envs'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/minion.py", line 2110, in _thread_multi_return
...
</code></pre>
<p>but in the end the CI job passes instead of failing</p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<p>I assume so far as long as <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1019">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1019</a> is not fully effective yet the problem can be reproduced by rerunning the CI job. The error message itself can be reproduced on osd with</p>
<pre><code>salt --no-color 's390zl12*' saltutil.sync_grains
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Obvious errors visible in the log of the "refresh" CI job should fail the CI job</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Crosscheck if the salt command itself provides a non-zero exit code when the problem reproduces -> the command on osd <code>salt --no-color 's390zl12*' saltutil.sync_grains; echo $?</code> yields "1" from the exit code. So likely the problem is that in the CI instructions the command executed over ssh is not properly valuing the exit code of the internal command execution</li>
<li>Ensure that the CI job values the exit code or error condition accordingly</li>
<li>Make sure the exit code is still evaluated regardless of shown error messages</li>
</ul>
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<ul>
<li>The problem seems to be related to the compound statement. <code>salt \* saltutil.sync_grains,saltutil.refresh_grains ,</code> yields a 0 exit code, <code>salt \* saltutil.sync_grains</code> yields 1</li>
</ul>
openQA Infrastructure - action #136325 (Resolved): salt deploy fails due to multiple offline work...https://progress.opensuse.org/issues/1363252023-09-22T12:14:41Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1848768#L9651" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1848768#L9651</a></p>
<pre><code>ERROR: Minions returned with non-zero exit code
sapworker2.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker3.qe.nue2.suse.org:
Minion did not return. [Not connected]
worker-arm2.oqa.prg2.suse.org:
Minion did not return. [Not connected]
worker-arm1.oqa.prg2.suse.org:
Minion did not return. [Not connected]
</code></pre>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add back to salt: sapworker2.qe.nue2.suse.org, sapworker3.qe.nue2.suse.org, worker-arm1.oqa.prg2.suse.org, worker-arm2.oqa.prg2.suse.org</li>
</ul>
<pre><code>for i in sapworker2.qe.nue2.suse.org sapworker3.qe.nue2.suse.org worker-arm1.oqa.prg2.suse.org worker-arm2.oqa.prg2.suse.org ; do sudo salt-key -y -a $i; done && sudo salt \* state.apply
</code></pre> openQA Infrastructure - action #134906 (Resolved): osd-deployment failed due to openqaworker1 sho...https://progress.opensuse.org/issues/1349062023-08-31T08:58:53Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1794346#L9197" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1794346#L9197</a> shows</p>
<pre><code>Minions returned with non-zero exit code
openqaworker1.qe.nue2.suse.org:
Minion did not return. [No response]
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> All OSD salt controlled machines are ensured to not be affected by unresponsive salt-minion <a href="https://bugzilla.opensuse.org/show_bug.cgi?id=1212816" class="external">https://bugzilla.opensuse.org/show_bug.cgi?id=1212816</a>, i.e. the salt-minion backport+package lock is applied to all salt controlled machines</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Research how to backport + package lock in salt recipes, e.g. start with <a href="https://docs.saltproject.io/en/latest/ref/modules/all/salt.modules.zypperpkg.html" class="external">https://docs.saltproject.io/en/latest/ref/modules/all/salt.modules.zypperpkg.html</a> or ask experts in chat (but be careful not be drawn into a "just install SUSE Manager" discussion)</li>
<li>Add instructions to salt to ensure the salt-minion package is backported and package locked</li>
<li>As alternative consider another separate repo that has the backported/fixed version and is applied to all salt controlled machines (<em>not</em> devel:openQA as this is a salt problem, not openQA machine specific)</li>
</ul>
openQA Infrastructure - action #134900 (Resolved): salt states fail to apply due to "Pillar openq...https://progress.opensuse.org/issues/1349002023-08-31T08:29:28Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1794135#L1178" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1794135#L1178</a></p>
<pre><code>----------
ID: apache2
Function: service.running
Result: False
Comment: One or more requisite failed: openqa.server./etc/apache2/ssl.crt/openqa.oqa.prg2.suse.org.crt, openqa.server./etc/apache2/ssl.key/openqa.oqa.prg2.suse.org.key
Started: 08:56:57.086043
Duration: 0.004 ms
Changes:
----------
</code></pre> openQA Infrastructure - action #134135 (New): openqa-monitor.qa.suse.de salt CI deploy telegraf c...https://progress.opensuse.org/issues/1341352023-08-11T13:08:47Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1751110#L5155" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1751110#L5155</a></p>
<pre><code>openqa-monitor.qa.suse.de:
2023-08-11T13:00:13Z E! [inputs.x509_cert] could not find file: [/etc/dehydrated/certs/monitor.qe.nue2.suse.org/fullchain.pem]
2023-08-11T13:00:18Z E! [telegraf] Error running agent: input plugins recorded 1 errors
</code></pre>
<p>likely related to the move of the VM along with its hypervisor to FC Basement in the domain .qe.nue2.suse.org</p>
openQA Infrastructure - action #134048 (New): openqa-piworker does not restart openQA worker proc...https://progress.opensuse.org/issues/1340482023-08-09T15:26:24Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>I observed on <a href="https://openqa.suse.de/admin/workers" class="external">https://openqa.suse.de/admin/workers</a> that openqa-piworker had old os-autoinst version. On the host we confirmed that the package is up-to-date but the openQA workers were not restarted yet. For historical reasons the node was brought into salt as "generic" machine, not "worker" causing this and multiple inconsistencies. We should make sure the machine is treated as proper worker</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openqa-piworker has salt role "worker" and salt high state cleanly applied</li>
<li><strong>AC2:</strong> RPi related openQA jobs on related nodes still work as expected</li>
<li><strong>AC3:</strong> No related alerts for old "generic" openqa-piworker or "worker" openqa-piworker in <a href="https://monitor.qa.suse.de" class="external">https://monitor.qa.suse.de</a></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add the role as documented in <a href="https://gitlab.suse.de/openqa/salt-states-openqa#openqa-salt-states" class="external">https://gitlab.suse.de/openqa/salt-states-openqa#openqa-salt-states</a> and apply a high state as test with e.g. <code>salt-call --local state.test</code> or from OSD <code>salt --state-output=changes 'openqa-piworker* state.test</code> and then apply without test if ok</li>
<li>Ensure RPi related openQA jobs on related nodes still work as expected</li>
<li>Ensure no related alerts for old "generic" openqa-piworker or "worker" openqa-piworker in <a href="https://monitor.qa.suse.de" class="external">https://monitor.qa.suse.de</a></li>
</ul>
openQA Infrastructure - action #134042 (Resolved): auto-update on OSD does not install updates du...https://progress.opensuse.org/issues/1340422023-08-09T14:46:55Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From OSD:</p>
<pre><code>$ sudo systemctl status auto-update
○ auto-update.service - Automatically patch system packages.
Loaded: loaded (/etc/systemd/system/auto-update.service; static)
Active: inactive (dead) since Wed 2023-08-09 02:38:25 CEST; 13h ago
TriggeredBy: ● auto-update.timer
Main PID: 19349 (code=exited, status=0/SUCCESS)
Aug 09 02:37:37 openqa sh[19351]: Building repository 'Update repository with updates from SUSE Linux Enterprise 15' cache [....done]
Aug 09 02:37:37 openqa sh[19351]: Loading repository data...
Aug 09 02:37:46 openqa sh[19351]: Reading installed packages...
Aug 09 02:38:23 openqa sh[19351]: Resolving package dependencies...
Aug 09 02:38:24 openqa sh[19351]: Problem: nothing provides 'libwebkit2gtk3 = 2.40.5' needed by the to be installed libwebkit2gtk3-lang-2.4>
Aug 09 02:38:24 openqa sh[19351]: Solution 1: deinstallation of libwebkit2gtk3-lang-2.38.6-150200.75.2.noarch
Aug 09 02:38:24 openqa sh[19351]: Solution 2: do not install patch:openSUSE-SLE-15.4-2023-3233-1.noarch
Aug 09 02:38:24 openqa sh[19351]: Solution 3: break libwebkit2gtk3-lang-2.40.5-150200.78.1.noarch by ignoring some of its dependencies
Aug 09 02:38:24 openqa sh[19351]: Choose from above solutions by number or cancel [1/2/3/c/d/?] (c): c
Aug 09 02:38:25 openqa systemd[1]: auto-update.service: Deactivated successfully.
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> auto-update.service fails when not all updates can be applied (so that our monitoring will alert on it)</li>
<li><strong>AC2:</strong> All current updates are applied cleanly on OSD</li>
<li><strong>AC3:</strong> All other salt controlled hosts have current updates applied</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Ensure that the command we use in auto-update.service fails the service</li>
<li>Make sure that patches+updates are applied for all salt controlled machines</li>
</ul>
openQA Infrastructure - coordination #132467 (New): [epic] Prevent redundant salt state.apply act...https://progress.opensuse.org/issues/1324672023-07-09T10:04:56Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>We manage more and more machines within our salt infrastructure <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a> so it becomes more important to make sure that the high state is applied efficiently. Normally a recurring call of <code>salt \* state.apply</code> should not take long and not do any changes on the system assuming that any previous call already applies all pending changes. So we should review the actions happening in recurring calls to <code>state.apply</code> and change all state rules accordingly so that they are really only called if necessary.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No redundant repeated actions visible when calling <code>salt \* state.apply</code></li>
<li><strong>AC2:</strong> All necessary actions are still applied on the systems including scripts in /opt/openqa-trigger-from-ibs-plugin</li>
</ul>
<a name="Acceptance-tests"></a>
<h2 >Acceptance tests<a href="#Acceptance-tests" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AT1-1:</strong> <em>Given</em> being logged in to OSD <em>When</em> calling <code>for i in {1..2}; do salt \* state.apply; done</code> <em>Then</em> the second call applies no changes to any systems</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Review all actions visible in the attached logfile redundant_salt_state_apply_calls_stdout_stripped.log and try to find better conditions, state combinations, fixes, etc. to not recurringly apply changes to systems</li>
</ul>
openQA Infrastructure - action #130378 (Workable): Integration of extra salt repositories as salt...https://progress.opensuse.org/issues/1303782023-06-05T11:18:24Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>In #118636 we decided that we will just use <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a> to control all machines that are not yet controlled by a remote management framework yet. For some machines we already have specific salt repositories, e.g. <a href="https://gitlab.suse.de/qa-sle/backup-server-salt" class="external">https://gitlab.suse.de/qa-sle/backup-server-salt</a> . We should try to use <a href="https://docs.saltproject.io/en/latest/topics/development/conventions/formulas.html" class="external">salt formulas</a> to integrate those repositories into our central infrastructure</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Changes to <a href="https://gitlab.suse.de/qa-sle/backup-server-salt" class="external">https://gitlab.suse.de/qa-sle/backup-server-salt</a> as well as <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a> are applied automatically to backup.qa.suse.de</li>
<li><strong>AC2:</strong> The solution to integrate <a href="https://gitlab.suse.de/qa-sle/backup-server-salt" class="external">https://gitlab.suse.de/qa-sle/backup-server-salt</a> is salted itself</li>
<li><strong>AC3:</strong> We know how to integrate other separate salt repositories</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Read <a href="https://docs.saltproject.io/en/latest/topics/development/conventions/formulas.html" class="external">https://docs.saltproject.io/en/latest/topics/development/conventions/formulas.html</a></li>
<li>Try to reference <a href="https://gitlab.suse.de/qa-sle/backup-server-salt" class="external">https://gitlab.suse.de/qa-sle/backup-server-salt</a> as a salt formula within <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a> but within salt itself but also at best coming from <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/</a> so that SUSE internal repos are only referenced internally</li>
<li>If the solution is not obvious then document how it's done, e.g. in gitlab.suse.de/openqa/salt-states-openqa/</li>
<li>Ensure <a href="https://gitlab.suse.de/qa-sle/backup-server-salt" class="external">https://gitlab.suse.de/qa-sle/backup-server-salt</a> explains that the repo is used from within gitlab.suse.de/openqa/salt-states-openqa/</li>
<li>Adapt or remove parts of the repo, e.g. adapt the gitlab CI integration to use the formula relying on the other repo (or just remove?)</li>
<li>If all else fails then just put everything from <a href="https://gitlab.suse.de/qa-sle/backup-server-salt" class="external">https://gitlab.suse.de/qa-sle/backup-server-salt</a> to <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a></li>
</ul>
openQA Infrastructure - action #128222 (New): [virtualization] The Xen specific host configuratio...https://progress.opensuse.org/issues/1282222023-04-24T12:51:53Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>With <a class="issue tracker-4 status-3 priority-4 priority-default closed child" title="action: Consolidate the installation of openqaw5-xen with SUSE QE Tools maintained machines size:M (Resolved)" href="https://progress.opensuse.org/issues/125534">#125534</a> resolved the host openqaw5-xen.qa.suse.de is covered in salt using <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a> as a generic host. To ensure that the Xen specific host configuration can be preserved we should cover Xen specific rules in salt as well.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> A Xen host able to execute openQA Xen based jobs can be configured from <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a></li>
<li><strong>AC2:</strong> The configuration is cleanly applied on openqaw5-xen.qa.suse.de</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Collect requirements regarding Xen configuration</li>
<li>Add the necessary configuration statements to <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a> or <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa</a> correspondingly</li>
<li>Ensure the config is covered in salt CI tests, e.g. with additional explicit role</li>
<li>Ensure that the config cleanly applies on openqaw5-xen.qa.suse.de</li>
</ul>
openQA Infrastructure - action #125141 (Workable): Salt pillars deployment pipeline failed on "tu...https://progress.opensuse.org/issues/1251412023-02-28T11:17:44Zmkittlermarius.kittler@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<pre><code> ID: security-sensor.repo
Function: pkgrepo.managed
Result: False
Comment: Failed to configure repo 'security-sensor.repo': Zypper command failure: Repository 'security-sensor.repo' is invalid.
[security-sensor.repo|https://download.opensuse.org/repositories/security:/sensor/15.4] Valid metadata not found at specified URL
History:
- Signature verification failed for repomd.xml
- Can't provide /repodata/repomd.xml
Please check if the URIs defined for this repository are pointing to a valid repository.
Skipping repository 'security-sensor.repo' because of the above error.
Could not refresh the repositories because of errors.Forcing raw metadata refresh
Retrieving repository 'security-sensor.repo' metadata [..........
Warning: File 'repomd.xml' from repository 'security-sensor.repo' is unsigned.
Note: Signing data enables the recipient to verify that no modifications occurred after the data
were signed. Accepting data with no, wrong or unknown signature can lead to a corrupted system
and in extreme cases even to a system compromise.
Note: File 'repomd.xml' is the repositories master index file. It ensures the integrity of the
whole repo.
Warning: We can't verify that no one meddled with this file, so it might not be trustworthy
anymore! You should not continue unless you know it's safe.
File 'repomd.xml' from repository 'security-sensor.repo' is unsigned, continue? [yes/no] (no): no
error]
Started: 09:39:50.917365
Duration: 9775.41 ms
Changes:
----------
ID: security-sensor.repo
Function: pkg.latest
Name: velociraptor-client
Result: False
Comment: One or more requisite failed: security_sensor.security-sensor.repo
Started: 09:40:00.699471
Duration: 0.011 ms
Changes:
…
Summary for tumblesle
--------------
Succeeded: 231 (changed=1)
Failed: 2
--------------
Total states run: 233
</code></pre>
<p>(<a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1427053/raw">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1427053/raw</a>)</p>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Find out what the host "tumblesle" is -> a VM on qamaster.qa.suse.de (according to <a href="https://racktables.suse.de/index.php?page=object&tab=default&object_id=1300">https://racktables.suse.de/index.php?page=object&tab=default&object_id=1300</a>), the full domain is tumblesle.qa.suse.de</li>
<li>Check whether the problem persists -> no the repo can be refreshed (on tumblesle)</li>
<li>Check whether the error handling (retries) is in accordance with how other repos are configured -> we use <code>pkgrepo.managed: - retry: attempts: 5</code> for our own devel repos, maybe the same would make sense for <code>security:sensor</code> as well; we don't have a retry for all repos configured via <code>pkgrepo.managed</code> so far, though</li>
</ul>
<a name="Remarks"></a>
<h2 >Remarks<a href="#Remarks" class="wiki-anchor">¶</a></h2>
<ul>
<li>Likely not specific to "tumblesle".</li>
<li>Looks like a temporary signing problem of security-sensor.repo (and not like a network issue). <em>DONE</em> So maybe a one-time issue and we don't need to introduce a retry. -> It is reproducible on tumblesle.qa.suse.de with</li>
</ul>
<pre><code>for i in {001..100}; do echo "## $i" && zypper ref --force -r security-sensor.repo; done
</code></pre>
<p>after 23 runs. Directly afterwards it was working to retrieve the file.</p>
<ul>
<li><em>Optional</em> Try to reproduce the above problem in a clean container environment, at best for crosschecking both Leap and Tumbleweed</li>
<li>Based on the above report an issue to zypper on <a href="https://github.com/openSUSE/zypper/">https://github.com/openSUSE/zypper/</a> as zypper claims "File is unsigned" which is apparently not true. It's likely a temporary connection issue. Better retry</li>
<li><em>Optional:</em> Additionally report an issue with the openSUSE infrastructure with a cross-reference</li>
</ul>
openQA Project - action #90167 (New): Setup initial salt infrastructure for remote management wit...https://progress.opensuse.org/issues/901672021-03-16T12:23:55Zokurzokurz@suse.com
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> On o3 <code>salt \* test.ping</code> returns all common worker hosts as well as o3 itself</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Ensure salt-minion on all o3 workers</li>
<li>Ensure salt-master on o3</li>
<li>Ensure workers are connected to o3 and salt key is accepted</li>
</ul>
openQA Project - coordination #43934 (Blocked): [epic] Manage o3 infrastructure with salt againhttps://progress.opensuse.org/issues/439342018-11-17T14:37:30Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>See <a class="issue tracker-4 status-3 priority-7 priority-highest closed" title="action: o3 workers immediately incompleting all jobs, caching service can not be reached (Resolved)" href="https://progress.opensuse.org/issues/43823#note-1">#43823#note-1</a> . Previously we had a salt-minion on each worker even though no salt recipes were used, at least we used salt for structured remote execution ;)</p>
<a name="Expected-result"></a>
<h2 >Expected result<a href="#Expected-result" class="wiki-anchor">¶</a></h2>
<p>As salt was there, is the preferred system management solution, and should be extended to have full recipes we should have a salt-minion available as well on all the workers.</p>
<a name="To-be-covered-for-o3-in-system-management-eg-salt-states"></a>
<h2 >To be covered for o3 in system management, e.g. salt states<a href="#To-be-covered-for-o3-in-system-management-eg-salt-states" class="wiki-anchor">¶</a></h2>
<ul>
<li>aarch64 irqbalance workaround <a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="action: Failed service "irqbalance" on aarch64.o.o (Resolved)" href="https://progress.opensuse.org/issues/53573">#53573</a></li>
<li>hugepages workaround <a class="issue tracker-4 status-3 priority-3 priority-lowest closed" title="action: all jobs on aarch64.o.o incompleted with "Permission denied" on /dev/hugepages, "others" had no r/w (Resolved)" href="https://progress.opensuse.org/issues/53234">#53234</a></li>
<li>ppc kvm permissions <a class="issue tracker-10 status-3 priority-3 priority-lowest closed" title="tickets: openQA ppc64le workers bad kvm setup (Resolved)" href="https://progress.opensuse.org/issues/25170">#25170</a></li>
</ul>