openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842024-03-20T18:18:05ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #157615 (Feedback): [alert] osd-deployment failed in post-deploy ,...https://progress.opensuse.org/issues/1576152024-03-20T18:18:05Zjbaier_czjbaier@suse.cz
<p>See <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2411217" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/jobs/2411217</a></p>
<pre><code>schort-server.qe.nue2.suse.org:
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked --exclude ""':
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed --exclude ""':
2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors
telegraf errors
monitor.qe.nue2.suse.org:
2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert 'https://monitor.qa.suse.de:443': dial tcp: lookup monitor.qa.suse.de: i/o timeout
2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors
telegraf errors
++ grep ' E! ' salt_post_deploy_checks.log
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [agent] Error killing process: os: process already finished
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state masked --exclude ""':
2024-03-20T16:23:32Z E! [inputs.exec] Error in plugin: exec: command timed out for command '/etc/telegraf/scripts/systemd_list_service_by_state_for_telegraf.sh --state failed --exclude ""':
2024-03-20T16:23:32Z E! [telegraf] Error running agent: input plugins recorded 2 errors
2024-03-20T16:23:31Z E! [inputs.x509_cert] Error in plugin: cannot get SSL cert 'https://monitor.qa.suse.de:443': dial tcp: lookup monitor.qa.suse.de: i/o timeout
2024-03-20T16:23:35Z E! [telegraf] Error running agent: input plugins recorded 1 errors
</code></pre>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ol>
<li>Understand why and where <code>systemd_list_service_by_state_for_telegraf.sh</code> times out. It could be the general telegraf-timeout in the pipeline, in the execution of the script itself (from telegraf.conf) or another place. Adjust the timeout to match expected runtime or fix the script to complete faster -> schort-server only has 1 VM core, consider configuring the hypervisor to use at least 2 cores</li>
<li>"Error killing process: os: process already finished" might just be a consequence of the above</li>
<li>"Error in plugin: cannot get SSL cert '<a href="https://monitor.qa.suse.de:443':" class="external">https://monitor.qa.suse.de:443':</a> dial tcp: lookup monitor.qa.suse.de: i/o timeout" possibly to be covered with some retrying? Investigate what the real error message means, ask <a href="https://www.ecosia.org/chat" class="external">https://www.ecosia.org/chat</a> (or if that does not work invest in coal-powered <a href="https://www.cat-gpt.com/chat" class="external">https://www.cat-gpt.com/chat</a> ) or something</li>
<li>If we cannot solve these problems, consider excluding them from CI execution to avoid false-positives. Consider the impact of doing this first however!</li>
</ol>
openQA Tests - action #125090 (Blocked): [qe-core] Capture audio output of rpi realhw SUTshttps://progress.opensuse.org/issues/1250902023-02-27T17:44:14Zdheidlerdheidler@suse.com
<p>The idea is to connect the sound output of each realhw SUT to a USB soundcard microphone port connected to the worker.<br>
This requires implementation of eg. <code>start_audiocapture()</code> in the generalhw backend.</p>