openSUSE Project Management Tool: Issueshttps://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842023-04-25T10:34:09ZopenSUSE Project Management Tool
Redmine openQA Infrastructure - action #128273 (Resolved): [alert] openqaworker-arm-1+2+ failed to recove...https://progress.opensuse.org/issues/1282732023-04-25T10:34:09Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p>We received multiple emails on 2023-04-23 around 1500Z related to the attempted automatic recovery of openqaworker-arm-1+2+. It is unclear if an SD ticket was automatically created about that.</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> The root problem was addressed</li>
<li><strong>AC2:</strong> The reason for the multi-level recovery attempt is understood</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li><em>DONE</em>: Ensure that all three openqaworker-arm-1+2+3 are up and running again -> They are up and running, no problem there</li>
<li>Check timely order execution steps, e.g. from <a href="https://gitlab.suse.de/openqa/grafana-webhook-actions/-/pipelines/660173" class="external">https://gitlab.suse.de/openqa/grafana-webhook-actions/-/pipelines/660173</a> for arm-3 and related jobs for arm-1+2</li>
<li>Understand the error source and address it, maybe we need to fix something there</li>
</ul>
QA - action #125444 (Resolved): Improve collaboration with Eng-Infra - SD ticket template size:Mhttps://progress.opensuse.org/issues/1254442023-03-06T12:19:18Zokurzokurz@suse.com
<a name="Motivation"></a>
<h2 >Motivation<a href="#Motivation" class="wiki-anchor">¶</a></h2>
<p>As we are relying on Eng-Infra a lot and need to coordinate our work we should define a ticket template to be used for SUSE SD Eng-Infra to improve our communication, to communicate impact, steps to reproduce, acceptance criteria</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> One of our usual wiki places defines a ticket template which we can copy-paste when we create an SD ticket</li>
<li><em>AC2:</em>* Everyone in the team is aware about the ticket template to be used</li>
</ul>
<a name="Suggestions"></a>
<h2 >Suggestions<a href="#Suggestions" class="wiki-anchor">¶</a></h2>
<ul>
<li>Define a template for SUSE SD Eng-Infra to improve our communication, to communicate impact, steps to reproduce, acceptance criteria
<ul>
<li>Back-reference ticket template so that improvements to the template can be suggested</li>
<li>Suggest to comment in progress ticket which can be shared with more people by default and helps to communicate and we can edit texts and know who is assigned</li>
</ul></li>
</ul>
openQA Infrastructure - action #92176 (Resolved): [alert] openqaworker-arm-3 offline and CI pipel...https://progress.opensuse.org/issues/921762021-05-05T14:08:32Zokurzokurz@suse.com
<a name="Observation"></a>
<h2 >Observation<a href="#Observation" class="wiki-anchor">¶</a></h2>
<p><a href="https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&editPanel=7&tab=alert" class="external">https://stats.openqa-monitor.qa.suse.de/d/1bNU0StZz/automatic-actions?orgId=1&editPanel=7&tab=alert</a> shows that the machine openqaworker-arm-3 is offline and <a href="https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/415264" class="external">https://gitlab.suse.de/openqa/grafana-webhook-actions/-/jobs/415264</a> is green but shows:</p>
<pre><code>Attempting to reboot openqaworker-arm-3
Error: Unable to establish IPMI v2 / RMCP+ session
/usr/sbin/sendmail: No such file or directory
. . . message not sent.
</code></pre>
<p>so two problems: email could not be sent but also that did not fail the pipeline</p>
<a name="Acceptance-criteria"></a>
<h2 >Acceptance criteria<a href="#Acceptance-criteria" class="wiki-anchor">¶</a></h2>
<ul>
<li><strong>AC1:</strong> pipeline fails in case email sending does not work</li>
<li><strong>AC2:</strong> email sending does work again for now for the above observed case</li>
</ul>