https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-06-08T13:39:37ZopenSUSE Project Management ToolopenQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4141402021-06-08T13:39:37Zmkittlermarius.kittler@suse.com
<ul></ul><blockquote>
<p>Prioritized as "Urgent" as the alert was ignored and not handled by multiple persons for days and we are apparently suffering from alarm fatigue</p>
</blockquote>
<p>Where was the alert visible? I am subscribed to <a href="mailto:osd-admins@suse.de">osd-admins@suse.de</a> but didn't receive an email (apart from the <code>Re:</code> you've just sent to the list).</p>
<blockquote>
<p>Ensure you have access to <a href="https://gitlab.suse.de/OPS-Service/monitoring/" class="external">https://gitlab.suse.de/OPS-Service/monitoring/</a> , ask in EngInfra ticket otherwise</p>
</blockquote>
<p>When accessing the page I get 404. I assume this is actually 403. So I'll ask infra for access. (In the meantime someone else can pick up the ticket of course.)</p>
openQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4141462021-06-08T13:44:28Zokurzokurz@suse.com
<ul></ul><p>mkittler wrote:</p>
<blockquote>
<blockquote>
<p>Prioritized as "Urgent" as the alert was ignored and not handled by multiple persons for days and we are apparently suffering from alarm fatigue</p>
</blockquote>
<p>Where was the alert visible? I am subscribed to <a href="mailto:osd-admins@suse.de">osd-admins@suse.de</a> but didn't receive an email (apart from the <code>Re:</code> you've just sent to the list).</p>
</blockquote>
<p>ah, good point. The alert was coming from nagios and is visible in the email and also in the referenced URL <a href="https://thruk.suse.de/thruk/cgi-bin/extinfo.cgi?type=2&host=openqa.suse.de&service=fs_%2Fassets" class="external">https://thruk.suse.de/thruk/cgi-bin/extinfo.cgi?type=2&host=openqa.suse.de&service=fs_%2Fassets</a><br>
I thought you have worked with nagios alerts in the past?<br>
Just today I added in <a href="https://progress.opensuse.org/projects/qa/wiki/Wiki#Onboarding-for-new-joiners" class="external">https://progress.opensuse.org/projects/qa/wiki/Wiki#Onboarding-for-new-joiners</a> "Ensure you have access to <a href="https://gitlab.suse.de/OPS-Service/monitoring" class="external">https://gitlab.suse.de/OPS-Service/monitoring</a> (create EngInfra ticket otherwise) and add yourself in <a href="https://gitlab.suse.de/OPS-Service/monitoring/-/tree/master/icinga/shared/contacts" class="external">https://gitlab.suse.de/OPS-Service/monitoring/-/tree/master/icinga/shared/contacts</a> to receive monitoring information".</p>
openQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4143352021-06-08T20:46:37Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>Feedback</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li></ul><p><a href="https://monitor.qa.suse.de/d/WebuiDb/webui-summary?editPanel=74&orgId=1&from=1600646082068&to=1623184779476" class="external">https://monitor.qa.suse.de/d/WebuiDb/webui-summary?editPanel=74&orgId=1&from=1600646082068&to=1623184779476</a> shows that we have not exceeded 90% for long but the alert is configured for 94% . I think we should go for 90% again, same as for other filesystems. In <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L60" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L60</a> we have configured to keep 20% free, i.e. 80% usage . And then nagios should be above the grafana alerting limit, e.g. 92% warning, 94% critical.</p>
<p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/502" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/502</a><br>
and<br>
<a href="https://gitlab.suse.de/OPS-Service/monitoring/-/merge_requests/16" class="external">https://gitlab.suse.de/OPS-Service/monitoring/-/merge_requests/16</a></p>
openQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4147752021-06-09T20:54:15Zokurzokurz@suse.com
<ul><li><strong>Due date</strong> set to <i>2021-07-07</i></li><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Blocked</i></li></ul><p><a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/502" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/502</a> merged, blocked on <a href="https://gitlab.suse.de/OPS-Service/monitoring/-/merge_requests/16" class="external">https://gitlab.suse.de/OPS-Service/monitoring/-/merge_requests/16</a></p>
openQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4161162021-06-14T18:19:19Zokurzokurz@suse.com
<ul></ul><p>Saw more alerts. My MR was still ignored. Created ticket as reminder: <a href="https://infra.nue.suse.com/SelfService/Display.html?id=189974" class="external">https://infra.nue.suse.com/SelfService/Display.html?id=189974</a></p>
openQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4191582021-06-22T20:37:11Zokurzokurz@suse.com
<ul></ul><p>MR was merged but the change is not effective. Maybe I need to explicitly mention "assets" in a separate line: <a href="https://gitlab.suse.de/OPS-Service/monitoring/-/merge_requests/18" class="external">https://gitlab.suse.de/OPS-Service/monitoring/-/merge_requests/18</a></p>
openQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4195362021-06-23T09:59:39Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Blocked</i> to <i>Resolved</i></li></ul><p>This worked. Alert thresholds in nagios are fine as well as grafana.</p>
openQA Infrastructure - action #93650: alert: PROBLEM Service Alert: openqa.suse.de/fs_/assets is WARNINGhttps://progress.opensuse.org/issues/93650?journal_id=4195452021-06-23T10:05:56Zokurzokurz@suse.com
<ul><li><strong>Copied to</strong> <i><a class="issue tracker-4 status-3 priority-5 priority-high3 closed" href="/issues/94576">action #94576</a>: alert: PROBLEM Service Alert: openqa.suse.de/fs_/results is WARNING</i> added</li></ul>