openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-02-21T11:58:03ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #155740 (Resolved): Scripts CI pipelines fail due to timeout after...https://progress.opensuse.org/issues/1557402024-02-21T11:58:03Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2298958" class="external">https://gitlab.suse.de/openqa/scripts-ci/-/jobs/2298958</a></p>
<pre><code>{"count":2,"failed":[],"ids":[13560656,13560657],"scheduled_product_id":2058111}
2 jobs have been created:
- http://openqa.suse.de/tests/13560656
- http://openqa.suse.de/tests/13560657
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
Job state of job ID 13560656: scheduled, waiting …
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
[...]
Job state of job ID 13560656: scheduled, waiting …
{"blocked_by_id":null,"id":13560656,"result":"none","state":"scheduled"}
Jo
ERROR: Job failed: execution took longer than 1h0m0s seconds
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Infrastructure - action #155737 (Rejected): Salt pillars pipelines fail due to refused con...https://progress.opensuse.org/issues/1557372024-02-21T11:55:22Zlivdywanliv.dywan@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2300476" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2300476</a></p>
<pre><code>++ grep ' E! ' salt_post_deploy_checks.log
2024-02-21T09:21:58Z E! [inputs.http] Error in plugin: [url=http://localhost:9530/influxdb/minion]: Get "http://localhost:9530/influxdb/minion": dial tcp [::1]:9530: connect: connection refused
2024-02-21T09:21:59Z E! [telegraf] Error running agent: input plugins recorded 1 errors
</code></pre>
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2300697" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/2300697</a></p>
<pre><code>monitor.qe.nue2.suse.org:
2024-02-21T10:55:17Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/maintenance_queue_monitor.py':
2024-02-21T10:55:17Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/submission_queue_monitor.py':
2024-02-21T10:55:21Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
++ grep ' E! ' salt_post_deploy_checks.log
2024-02-21T10:54:59Z E! [inputs.http] Error in plugin: [url=http://localhost:9530/influxdb/minion]: Get "http://localhost:9530/influxdb/minion": dial tcp [::1]:9530: connect: connection refused
2024-02-21T10:54:59Z E! [telegraf] Error running agent: input plugins recorded 1 errors
2024-02-21T10:55:17Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/maintenance_queue_monitor.py':
2024-02-21T10:55:17Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/submission_queue_monitor.py':
2024-02-21T10:55:21Z E! [telegraf] Error running agent: input plugins recorded 2 errors
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li></li>
</ul>
openQA Infrastructure - action #137984 (Resolved): salt "refresh" job full of errors but CI job p...https://progress.opensuse.org/issues/1379842023-10-13T18:04:54Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1900948">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1900948</a> shows a lot of errors, e.g.</p>
<pre><code>s390zl13.oqa.prg2.suse.org:
----------
mine.update:
True
saltutil.refresh_grains:
True
saltutil.refresh_pillar:
True
saltutil.sync_grains:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/modules/saltutil.py", line 79, in _get_top_file_envs
return __context__["saltutil._top_file_envs"]
File "/usr/lib/python3.6/site-packages/salt/loader/context.py", line 78, in __getitem__
return self.value()[item]
KeyError: 'saltutil._top_file_envs'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/salt/minion.py", line 2110, in _thread_multi_return
...
</code></pre>
<p>but in the end the CI job passes instead of failing</p>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<p>I assume so far as long as <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1019">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/1019</a> is not fully effective yet the problem can be reproduced by rerunning the CI job. The error message itself can be reproduced on osd with</p>
<pre><code>salt --no-color 's390zl12*' saltutil.sync_grains
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> Obvious errors visible in the log of the "refresh" CI job should fail the CI job</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em> Crosscheck if the salt command itself provides a non-zero exit code when the problem reproduces -> the command on osd <code>salt --no-color 's390zl12*' saltutil.sync_grains; echo $?</code> yields "1" from the exit code. So likely the problem is that in the CI instructions the command executed over ssh is not properly valuing the exit code of the internal command execution</li>
<li>Ensure that the CI job values the exit code or error condition accordingly</li>
<li>Make sure the exit code is still evaluated regardless of shown error messages</li>
</ul>
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<ul>
<li>The problem seems to be related to the compound statement. <code>salt \* saltutil.sync_grains,saltutil.refresh_grains ,</code> yields a 0 exit code, <code>salt \* saltutil.sync_grains</code> yields 1</li>
</ul>
openQA Infrastructure - action #136325 (Resolved): salt deploy fails due to multiple offline work...https://progress.opensuse.org/issues/1363252023-09-22T12:14:41Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1848768#L9651" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1848768#L9651</a></p>
<pre><code>ERROR: Minions returned with non-zero exit code
sapworker2.qe.nue2.suse.org:
Minion did not return. [Not connected]
sapworker3.qe.nue2.suse.org:
Minion did not return. [Not connected]
worker-arm2.oqa.prg2.suse.org:
Minion did not return. [Not connected]
worker-arm1.oqa.prg2.suse.org:
Minion did not return. [Not connected]
</code></pre>
<a name="Rollback-actions"></a>
<h2 >Rollback actions<a href="#Rollback-actions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add back to salt: sapworker2.qe.nue2.suse.org, sapworker3.qe.nue2.suse.org, worker-arm1.oqa.prg2.suse.org, worker-arm2.oqa.prg2.suse.org</li>
</ul>
<pre><code>for i in sapworker2.qe.nue2.suse.org sapworker3.qe.nue2.suse.org worker-arm1.oqa.prg2.suse.org worker-arm2.oqa.prg2.suse.org ; do sudo salt-key -y -a $i; done && sudo salt \* state.apply
</code></pre> openQA Infrastructure - action #134906 (Resolved): osd-deployment failed due to openqaworker1 sho...https://progress.opensuse.org/issues/1349062023-08-31T08:58:53Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1794346#L9197" class="external">https://gitlab.suse.de/openqa/osd-deployment/-/jobs/1794346#L9197</a> shows</p>
<pre><code>Minions returned with non-zero exit code
openqaworker1.qe.nue2.suse.org:
Minion did not return. [No response]
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> All OSD salt controlled machines are ensured to not be affected by unresponsive salt-minion <a href="https://bugzilla.opensuse.org/show_bug.cgi?id=1212816" class="external">https://bugzilla.opensuse.org/show_bug.cgi?id=1212816</a>, i.e. the salt-minion backport+package lock is applied to all salt controlled machines</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Research how to backport + package lock in salt recipes, e.g. start with <a href="https://docs.saltproject.io/en/latest/ref/modules/all/salt.modules.zypperpkg.html" class="external">https://docs.saltproject.io/en/latest/ref/modules/all/salt.modules.zypperpkg.html</a> or ask experts in chat (but be careful not be drawn into a "just install SUSE Manager" discussion)</li>
<li>Add instructions to salt to ensure the salt-minion package is backported and package locked</li>
<li>As alternative consider another separate repo that has the backported/fixed version and is applied to all salt controlled machines (<em>not</em> devel:openQA as this is a salt problem, not openQA machine specific)</li>
</ul>
openQA Infrastructure - action #134900 (Resolved): salt states fail to apply due to "Pillar openq...https://progress.opensuse.org/issues/1349002023-08-31T08:29:28Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1794135#L1178" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1794135#L1178</a></p>
<pre><code>----------
ID: apache2
Function: service.running
Result: False
Comment: One or more requisite failed: openqa.server./etc/apache2/ssl.crt/openqa.oqa.prg2.suse.org.crt, openqa.server./etc/apache2/ssl.key/openqa.oqa.prg2.suse.org.key
Started: 08:56:57.086043
Duration: 0.004 ms
Changes:
----------
</code></pre> openQA Infrastructure - action #134861 (Rejected): https://stats.openqa-monitor.qa.suse.de/ repor...https://progress.opensuse.org/issues/1348612023-08-30T15:00:43Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>See <a href="https://stats.openqa-monitor.qa.suse.de/" class="external">https://stats.openqa-monitor.qa.suse.de/</a></p>
openQA Infrastructure - action #134135 (New): openqa-monitor.qa.suse.de salt CI deploy telegraf c...https://progress.opensuse.org/issues/1341352023-08-11T13:08:47Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1751110#L5155" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1751110#L5155</a></p>
<pre><code>openqa-monitor.qa.suse.de:
2023-08-11T13:00:13Z E! [inputs.x509_cert] could not find file: [/etc/dehydrated/certs/monitor.qe.nue2.suse.org/fullchain.pem]
2023-08-11T13:00:18Z E! [telegraf] Error running agent: input plugins recorded 1 errors
</code></pre>
<p>likely related to the move of the VM along with its hypervisor to FC Basement in the domain .qe.nue2.suse.org</p>
openQA Infrastructure - action #134048 (New): openqa-piworker does not restart openQA worker proc...https://progress.opensuse.org/issues/1340482023-08-09T15:26:24Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>I observed on <a href="https://openqa.suse.de/admin/workers" class="external">https://openqa.suse.de/admin/workers</a> that openqa-piworker had old os-autoinst version. On the host we confirmed that the package is up-to-date but the openQA workers were not restarted yet. For historical reasons the node was brought into salt as "generic" machine, not "worker" causing this and multiple inconsistencies. We should make sure the machine is treated as proper worker</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> openqa-piworker has salt role "worker" and salt high state cleanly applied</li>
<li><strong>AC2:</strong> RPi related openQA jobs on related nodes still work as expected</li>
<li><strong>AC3:</strong> No related alerts for old "generic" openqa-piworker or "worker" openqa-piworker in <a href="https://monitor.qa.suse.de" class="external">https://monitor.qa.suse.de</a></li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Add the role as documented in <a href="https://gitlab.suse.de/openqa/salt-states-openqa#openqa-salt-states" class="external">https://gitlab.suse.de/openqa/salt-states-openqa#openqa-salt-states</a> and apply a high state as test with e.g. <code>salt-call --local state.test</code> or from OSD <code>salt --state-output=changes 'openqa-piworker* state.test</code> and then apply without test if ok</li>
<li>Ensure RPi related openQA jobs on related nodes still work as expected</li>
<li>Ensure no related alerts for old "generic" openqa-piworker or "worker" openqa-piworker in <a href="https://monitor.qa.suse.de" class="external">https://monitor.qa.suse.de</a></li>
</ul>
openQA Infrastructure - action #134042 (Resolved): auto-update on OSD does not install updates du...https://progress.opensuse.org/issues/1340422023-08-09T14:46:55Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From OSD:</p>
<pre><code>$ sudo systemctl status auto-update
○ auto-update.service - Automatically patch system packages.
Loaded: loaded (/etc/systemd/system/auto-update.service; static)
Active: inactive (dead) since Wed 2023-08-09 02:38:25 CEST; 13h ago
TriggeredBy: ● auto-update.timer
Main PID: 19349 (code=exited, status=0/SUCCESS)
Aug 09 02:37:37 openqa sh[19351]: Building repository 'Update repository with updates from SUSE Linux Enterprise 15' cache [....done]
Aug 09 02:37:37 openqa sh[19351]: Loading repository data...
Aug 09 02:37:46 openqa sh[19351]: Reading installed packages...
Aug 09 02:38:23 openqa sh[19351]: Resolving package dependencies...
Aug 09 02:38:24 openqa sh[19351]: Problem: nothing provides 'libwebkit2gtk3 = 2.40.5' needed by the to be installed libwebkit2gtk3-lang-2.4>
Aug 09 02:38:24 openqa sh[19351]: Solution 1: deinstallation of libwebkit2gtk3-lang-2.38.6-150200.75.2.noarch
Aug 09 02:38:24 openqa sh[19351]: Solution 2: do not install patch:openSUSE-SLE-15.4-2023-3233-1.noarch
Aug 09 02:38:24 openqa sh[19351]: Solution 3: break libwebkit2gtk3-lang-2.40.5-150200.78.1.noarch by ignoring some of its dependencies
Aug 09 02:38:24 openqa sh[19351]: Choose from above solutions by number or cancel [1/2/3/c/d/?] (c): c
Aug 09 02:38:25 openqa systemd[1]: auto-update.service: Deactivated successfully.
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> auto-update.service fails when not all updates can be applied (so that our monitoring will alert on it)</li>
<li><strong>AC2:</strong> All current updates are applied cleanly on OSD</li>
<li><strong>AC3:</strong> All other salt controlled hosts have current updates applied</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Ensure that the command we use in auto-update.service fails the service</li>
<li>Make sure that patches+updates are applied for all salt controlled machines</li>
</ul>
openQA Infrastructure - action #133928 (Resolved): salt-states-openqa | Failed pipeline for masterhttps://progress.opensuse.org/issues/1339282023-08-07T12:19:19Ztinitatina.mueller+trick-redmine@suse.com
<p><a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1741006" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1741006</a><br>
ERROR: Job failed: exit code 1</p>
<p>I can't find any error in the log.</p>
openQA Infrastructure - action #133793 (Resolved): salt-pillars-openqa failing to apply within 2h...https://progress.opensuse.org/issues/1337932023-08-04T08:12:09Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>See <a href="https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1734178" class="external">https://gitlab.suse.de/openqa/salt-pillars-openqa/-/jobs/1734178</a> running into the 2h gitlab CI timeout while applying a salt high state. There is a lot of not helpful debug output with all the lines with "Result: Clean - Started:" and a mention of hosts being down "backup.qa.suse.de" and "openqaworker1.qe.nue2.suse.org" but it's not being clear which minions in the end do not return</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> By default no lines with "Result: Clean - Started:": Put them in another logfile to be uploaded</li>
<li><strong>AC2:</strong> No repeated "++ true, ++ sleep 1, ++ echo -n .": </li>
<li><strong>AC3:</strong> We know which minions did not complete</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>For "Result: Clean - Started:" how about just grep for those and move to a logfile? <a href="https://unix.stackexchange.com/questions/414042/how-to-split-an-output-to-two-files-with-grep" class="external">https://unix.stackexchange.com/questions/414042/how-to-split-an-output-to-two-files-with-grep</a> can help</li>
<li>okurz just used</li>
</ul>
<pre><code>sudo salt --no-color --state-output=changes 'backup-qam.qe.nue2.suse.org' state.apply queue=True | awk '/Result: Clean - Started/ {print > "/tmp/salt_profiling.log"; next} 1'
</code></pre>
<p>which provides nice terse output and all the profiling information into /tmp/salt_profiling.log</p>
<ul>
<li>Maybe don't apply the "set -x" for those commands with the dot outputting</li>
</ul>
<a name="Out-of-scope"></a>
<h2 >Out of scope<a href="#Out-of-scope" class="wiki-anchor">¶</a></h2>
<ul>
<li>Timeout before the 2h gitlab CI timeout and write down which minions are still busy executing jobs -> <a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: salt-states-openqa gitlab CI pipeline aborted with error after 2h of execution size:M (Resolved)" href="https://progress.opensuse.org/issues/133457">#133457</a></li>
</ul>
openQA Infrastructure - action #132818 (Resolved): salt state for worker in CI test does not appl...https://progress.opensuse.org/issues/1328182023-07-16T10:27:05Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>From <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1693089#L10505">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1693089#L10505</a></p>
<pre><code>----------
ID: /var/lib/openqa
Function: mount.mounted
Result: False
Comment: An exception occurred in this state: Traceback (most recent call last):
File "/usr/lib/python3.11/site-packages/salt/state.py", line 2401, in call
ret = self.states[cdata["full"]](
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/loader/lazy.py", line 149, in __call__
return self.loader.run(run_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/loader/lazy.py", line 1234, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/loader/lazy.py", line 1249, in _run_as
return _func_or_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/loader/lazy.py", line 1282, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/states/mount.py", line 774, in mounted
out = __salt__["mount.set_fstab"](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/loader/lazy.py", line 149, in __call__
return self.loader.run(run_func, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/loader/lazy.py", line 1234, in run
return self._last_context.run(self._run_as, _func_or_method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/loader/lazy.py", line 1249, in _run_as
return _func_or_method(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/site-packages/salt/modules/mount.py", line 877, in set_fstab
raise CommandExecutionError('Bad config file "{}"'.format(config))
salt.exceptions.CommandExecutionError: Bad config file "/etc/fstab"
Started: 10:13:09.444309
Duration: 35.476 ms
Changes:
</code></pre>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No errors concerning file mounts in salt CI pipelines</li>
</ul>
<a name="Steps-to-reproduce"></a>
<h2 >Steps to reproduce<a href="#Steps-to-reproduce" class="wiki-anchor">¶</a></h2>
<ul>
<li>Can be reproduced within the CI jobs, should be crosschecked how it's reproducible manually but it seems changes on production workers apply just fine</li>
<li>Crosscheck in a container environment same as CI does</li>
<li>DuckDuckGo for the error message</li>
</ul>
openQA Infrastructure - action #132470 (Resolved): salt states fail to apply due to glibc error o...https://progress.opensuse.org/issues/1324702023-07-09T11:52:44Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>On storage.oqa.suse.de:</p>
<pre><code># zypper ref
zypper: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /usr/lib64/libzypp.so.1722)
</code></pre>
<p>which then also shows up in <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1679723" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/1679723</a></p>
openQA Infrastructure - coordination #132467 (New): [epic] Prevent redundant salt state.apply act...https://progress.opensuse.org/issues/1324672023-07-09T10:04:56Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>We manage more and more machines within our salt infrastructure <a href="https://gitlab.suse.de/openqa/salt-states-openqa" class="external">https://gitlab.suse.de/openqa/salt-states-openqa</a> so it becomes more important to make sure that the high state is applied efficiently. Normally a recurring call of <code>salt \* state.apply</code> should not take long and not do any changes on the system assuming that any previous call already applies all pending changes. So we should review the actions happening in recurring calls to <code>state.apply</code> and change all state rules accordingly so that they are really only called if necessary.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> No redundant repeated actions visible when calling <code>salt \* state.apply</code></li>
<li><strong>AC2:</strong> All necessary actions are still applied on the systems including scripts in /opt/openqa-trigger-from-ibs-plugin</li>
</ul>
<a name="Acceptance-tests"></a>
<h2 >Acceptance tests<a href="#Acceptance-tests" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AT1-1:</strong> <em>Given</em> being logged in to OSD <em>When</em> calling <code>for i in {1..2}; do salt \* state.apply; done</code> <em>Then</em> the second call applies no changes to any systems</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Review all actions visible in the attached logfile redundant_salt_state_apply_calls_stdout_stripped.log and try to find better conditions, state combinations, fixes, etc. to not recurringly apply changes to systems</li>
</ul>