https://progress.opensuse.org/https://progress.opensuse.org/themes/openSUSE/favicon/favicon.ico?15829177842021-10-07T09:12:49ZopenSUSE Project Management ToolopenQA Project - action #100503: Identify all "finalize_job_results" failures and handle them (report ticket or fix)https://progress.opensuse.org/issues/100503?journal_id=4530422021-10-07T09:12:49Zlivdywanliv.dywan@suse.com
<ul></ul><p>We discussed it briefly. It is not clear what the goal here is. If we want to catch failures in hook scripts, we need tickets about failures. And if we're not seeing alerts we need to look into that. We're also not sure why this is "High" when e.g. <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a> which is about minion failures that show as green. Also, it's unclear what user jobs refers to here.</p>
openQA Project - action #100503: Identify all "finalize_job_results" failures and handle them (report ticket or fix)https://progress.opensuse.org/issues/100503?journal_id=4536862021-10-10T21:14:06Zokurzokurz@suse.com
<ul></ul><p>cdywan wrote:</p>
<blockquote>
<p>We discussed it briefly. It is not clear what the goal here is.</p>
</blockquote>
<p>Hm, the goal should have been clear by AC1: "All recent "finalize_job_result" failures are investigated and handled accordingly (ticket reported or fixed)". But let me try to rephrase if maybe it's not clear: There are no failing minion jobs of type "finalize_job_results" found on osd and o3 where the reason of failure is not already known and handled in other tickets</p>
<blockquote>
<p>If we want to catch failures in hook scripts, we need tickets about failures.</p>
</blockquote>
<p>well, if we currently don't have any such failures then this ticket is not that urgent</p>
<blockquote>
<p>And if we're not seeing alerts we need to look into that.</p>
</blockquote>
<p>yes but I am not aware of any specific problems about "jobs fail but no alerts".</p>
<blockquote>
<p>We're also not sure why this is "High" when e.g. <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a> which is about minion failures that show as green.</p>
</blockquote>
<p>It's mostly high because I thought you and tina are already working on related tasks. Not sure which ticket you handle those hook job failures in, maybe still the o3 Leap 15.3 upgrade ticket?</p>
<p>Regarding <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a> there you asked specifically for alerts on o3 which I see mostly blocked by <a class="issue tracker-4 status-12 priority-3 priority-lowest" title="action: Add/fix openqa_logwarn for o3 and osd sending to o3-admins@suse.de and osd-admins@suse.de respect... (Workable)" href="https://progress.opensuse.org/issues/57239">#57239</a> hence <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a> is not in the backlog</p>
<blockquote>
<p>Also, it's unclear what user jobs refers to here.</p>
</blockquote>
<p>What "user jobs"?</p>
openQA Project - action #100503: Identify all "finalize_job_results" failures and handle them (report ticket or fix)https://progress.opensuse.org/issues/100503?journal_id=4541122021-10-11T12:13:55Ztinitatina.mueller+trick-redmine@suse.com
<ul></ul><p>okurz wrote:</p>
<blockquote>
<p>It's mostly high because I thought you and tina are already working on related tasks. Not sure which ticket you handle those hook job failures in, maybe still the o3 Leap 15.3 upgrade ticket?</p>
</blockquote>
<p><a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: Investigation jobs triggered but missing comment on original job size:M (Resolved)" href="https://progress.opensuse.org/issues/99519">#99519</a></p>
<blockquote>
<p>Regarding <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a> there you asked specifically for alerts on o3 which I see mostly blocked by <a class="issue tracker-4 status-12 priority-3 priority-lowest" title="action: Add/fix openqa_logwarn for o3 and osd sending to o3-admins@suse.de and osd-admins@suse.de respect... (Workable)" href="https://progress.opensuse.org/issues/57239">#57239</a> hence <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a> is not in the backlog</p>
</blockquote>
<p>Why is <a class="issue tracker-4 status-12 priority-3 priority-lowest" title="action: Add/fix openqa_logwarn for o3 and osd sending to o3-admins@suse.de and osd-admins@suse.de respect... (Workable)" href="https://progress.opensuse.org/issues/57239">#57239</a> blocking <a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a>?</p>
<p><a class="issue tracker-4 status-3 priority-5 priority-high3 closed" title="action: Investigation jobs triggered but missing comment on original job size:M (Resolved)" href="https://progress.opensuse.org/issues/99519">#99519</a> is about missing comments. I investigated the problem, and one finding was that we don't catch failures in hook scripts as actual failures.</p>
<p><a class="issue tracker-4 status-3 priority-4 priority-default closed" title="action: Minion jobs for job hooks failed silently on o3 size:M (Resolved)" href="https://progress.opensuse.org/issues/99741">#99741</a> is a followup to that and the other failure regarding <code>/bin/sh</code>.</p>
<blockquote>
<blockquote>
<p>Also, it's unclear what user jobs refers to here.</p>
</blockquote>
<p>What "user jobs"?</p>
</blockquote>
<p>The ticket description mentions "user-provided hook script"</p>
openQA Project - action #100503: Identify all "finalize_job_results" failures and handle them (report ticket or fix)https://progress.opensuse.org/issues/100503?journal_id=4544662021-10-12T08:57:18Zlivdywanliv.dywan@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/454466/diff?detail_id=430565">diff</a>)</li></ul> openQA Project - action #100503: Identify all "finalize_job_results" failures and handle them (report ticket or fix)https://progress.opensuse.org/issues/100503?journal_id=4544692021-10-12T08:58:51Zlivdywanliv.dywan@suse.com
<ul><li><strong>Description</strong> updated (<a title="View differences" href="/journals/454469/diff?detail_id=430568">diff</a>)</li></ul> openQA Project - action #100503: Identify all "finalize_job_results" failures and handle them (report ticket or fix)https://progress.opensuse.org/issues/100503?journal_id=4544722021-10-12T09:02:36Zlivdywanliv.dywan@suse.com
<ul><li><strong>Status</strong> changed from <i>New</i> to <i>Resolved</i></li><li><strong>Assignee</strong> set to <i>livdywan</i></li></ul><p>There are currently two jobs, both failing because of <a class="issue tracker-6 status-1 priority-4 priority-default child parent" title="coordination: [epic] Better handle minion tasks failing with "Job terminated unexpectedly" (New)" href="https://progress.opensuse.org/issues/99831">#99831</a> and no other ones. If we do see failures after all, we'll get alerts and handle those but we don't want to ignore these in general.</p>