https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-01-19T13:07:25ZopenSUSE Project Management ToolopenQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3791832021-01-19T13:07:25Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>okurz</i></li></ul> openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3793592021-01-20T09:20:53Zokurzokurz@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Workable</i></li><li><strong>Assignee</strong> deleted (<del><i>okurz</i></del>)</li></ul><p>I started with this but could not find according entries in influxdb. I forgot how to properly test this again. But as we have too many tickets "in progress" I will set back to "Workable".</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3816772021-02-04T20:08:34Zokurzokurz@suse.com
<ul><li><strong>Parent task</strong> changed from <i>#78390</i> to <i>#80142</i></li></ul> openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3817852021-02-05T16:54:33Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Workable</i> to <i>In Progress</i></li><li><strong>Assignee</strong> set to <i>mkittler</i></li></ul><p>SR: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/442" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/442</a></p>
<blockquote>
<p>I started with this but could not find according entries in influxdb.</p>
</blockquote>
<p>No entries were showing up due to permission errors. Even with <code>--debug</code> this was not visible at all and I could only figure it out by guessing. (So <code>grant select on table workers to telegraf;</code> fixed the problem.)</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3817862021-02-05T17:35:14Zokurzokurz@suse.com
<ul></ul><p>mkittler wrote:<br>
So <code>grant select on table workers to telegraf;</code> fixed the problem.</p>
<p>ok but please include that in salt as well. See <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L166" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/blob/master/openqa/server.sls#L166</a> and following lines. And please add an alert on the panel.</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3817952021-02-06T04:08:36Zopenqa_reviewopenqa-review@suse.de
<ul><li><strong>Due date</strong> set to <i>2021-02-20</i></li></ul><p>Setting due date based on mean cycle time of SUSE QE Tools</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3818452021-02-08T10:08:28Zmkittlermarius.kittler@suse.com
<ul></ul><p>SRs for further improvements: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/444" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/444</a> <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/443" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/443</a></p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3820352021-02-09T11:01:38Zmkittlermarius.kittler@suse.com
<ul></ul><p>SR for alert: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/447" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/447</a></p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3821222021-02-10T07:57:28Zokurzokurz@suse.com
<ul></ul><p>All three MRs are merged and are effective. Today I found that osd deployment alerts have failed in the "1m after" and "10m after" deployment alerts but not the "1h after". Can you please look into that and ensure that a deployment does not trigger the "broken" alert?</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3821542021-02-10T10:11:35Zmkittlermarius.kittler@suse.com
<ul></ul><p>MR to fix that: <a href="https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/451" class="external">https://gitlab.suse.de/openqa/salt-states-openqa/-/merge_requests/451</a> (commit message contains more details)</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3822082021-02-11T09:32:17Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Feedback</i></li></ul><p>Let's wait until the next deployment to see whether it worked.</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3845892021-02-22T12:47:31Zlivdywanliv.dywan@suse.com
<ul><li><strong>Due date</strong> changed from <i>2021-02-20</i> to <i>2021-02-26</i></li></ul><p>No <a href="https://openqa.suse.de/admin/workers" class="external">broken workers in the web UI</a> or alerts on <code>osd-admins@suse.de</code> that I can see. Bumping the <em>due date</em> so we can check again later this week. Alternatively, consider breaking a worker on purpose?</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=3854352021-02-24T16:37:49Zmkittlermarius.kittler@suse.com
<ul><li><strong>Status</strong> changed from <i>Feedback</i> to <i>Resolved</i></li></ul><p>The alert hasn't fired during the deployment today although we had a few broken workers for a few minutes (< 15 minutes).</p>
openQA Project - action #87898: Add grafana alert for "broken workers" as reported by openQAhttps://progress.opensuse.org/issues/87898?journal_id=4078122021-05-17T09:13:38Zokurzokurz@suse.com
<ul><li><strong>Due date</strong> deleted (<del><i>2021-02-26</i></del>)</li></ul>