action #164733: qem-dashboard (and hence qem-bot) see a job as failed even though it's marked as softfailed since > 30 days in openQA size:M - QA (public) - openSUSE Project Management Tool

Actions

action #164733

closed

coordination #99303: [saga][epic] Improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

coordination #155671: [epic] Better handling of SLE maintenance test review

qem-dashboard (and hence qem-bot) see a job as failed even though it's marked as softfailed since > 30 days in openQA size:M

Added by okurz 9 months ago. Updated 5 months ago.

Status:

Resolved

Priority:

Normal

Assignee:

jbaier_cz

Target version:

openQA Project (public) - Ready

Start date:

2024-07-31

Due date:

% Done:

Estimated time:

Tags:

qem-bot, qem-dashboard

Description

Observation¶

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2892821#L54 shows

2024-07-31 07:03:25 INFO     Found failed, not-ignored job https://openqa.suse.de/t14779563 for incident 34532

even though in https://openqa.suse.de/tests/14779563 it's visible that the job was "force_result'd" as part of https://openqa.suse.de/tests/14779563#comment-1538601 already on 2024-07-02. The most recent "sync incidents" job https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2892922 does not mention 14779563.

Expected result¶

E1: http://dashboard.qam.suse.de/blocked?incident=34532&group_names=SP5 or the equivalent URL from the database should show no failed job
E2: http://dashboard.qam.suse.de/incident/34532 should show no failed job
E3: The latest "approve incidents" pipeline from https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules should not mention https://openqa.suse.de/tests/14779563 as failed

Suggestions¶

Ask test reviewers about examples
- Look for jobs that are softfailed via a force_result label and at the same time still failed on the dashboard.
Check how the "sync incidents" works on already finished results. Maybe already finished results are only revisited if an AMQP event for a new comment is received and that event could have gone missed so the existing result is never revisited?
To reproduce a qem-dashboard database dump is available on qam2.suse.de within the "postgresql" machine in /root/dashboard_db-2024-07-31T09:43:21+02:00.sql.xz
Setup dashboard/bot/openQA locally and simulate how the dashboard/bot behave if an already finished job changes its result
Look at the qem-bot code to check for any obvious problems with handling softfailed jobs

Related issues 1 (0 open — 1 closed)

Actions

Copy link

Updated by okurz 9 months ago

Description updated (diff)

Actions

Copy link

Updated by okurz 9 months ago

Related to action #157204: Sync openQA job removal events to qem-dashboard listening to AMQP events size:M added

Actions

Copy link

Updated by okurz 9 months ago

Parent task set to #155671

Actions

Copy link

Updated by okurz 7 months ago

Description updated (diff)

Actions

Copy link

Updated by livdywan 6 months ago

Description updated (diff)

Actions

Copy link

Updated by livdywan 6 months ago

Subject changed from qem-dashboard (and hence qem-bot) see a job as failed even though it's marked as softfailed since > 30 days in openQA to qem-dashboard (and hence qem-bot) see a job as failed even though it's marked as softfailed since > 30 days in openQA size:M
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

Updated by mkittler 6 months ago

Target version changed from Tools - Next to Ready

Actions

Copy link

Updated by jbaier_cz 6 months ago

Assignee set to jbaier_cz

Actions

Copy link

Updated by jbaier_cz 6 months ago

Let's create a data point here, at this moment (from the latest bot approval pipeline https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/3322469) updates are blocked by 12 failing openQA tests. None of them is soft-failed or ignored with a comment, so everything behave as expected so far.

Actions

Copy link

#10

Updated by jbaier_cz 5 months ago

Another datapoint, right now we have some soft-failed test (via automatic force_result) like https://openqa.suse.de/tests/15939391; the dashboard show correctly no red results and there is no trace for halted approval due to test failure in the bot pipeline either.

Actions

Copy link

#11

Updated by jbaier_cz 5 months ago

Status changed from Workable to Resolved

Again, no soft-failed job blocking a release. I even managed to once more verify, that the "acceptable_for" feature is working as intended:

2024-11-25 15:05:51 INFO     Ignoring failed job https://openqa.suse.de/t15996419 for incident 36467 due to openQA comment
...
2024-11-25 15:05:52 INFO     Incidents to approve:
2024-11-25 15:05:52 INFO     * SUSE:Maintenance:36467:353830
2024-11-25 15:05:52 INFO     Accepting review for SUSE:Maintenance:36467:353830

Considering the age of this ticket, I believe we already improved the workflow in the mean time so everything is working as expected. Hence I am marking this as resolved unless someone points out a new examples with the current code base.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public)

Tags

Custom queries

action #164733

qem-dashboard (and hence qem-bot) see a job as failed even though it's marked as softfailed since > 30 days in openQA size:M

Observation¶

Expected result¶

Suggestions¶

Updated by okurz 9 months ago

Updated by okurz 9 months ago

Updated by okurz 9 months ago

Updated by okurz 7 months ago

Updated by livdywan 6 months ago

Updated by livdywan 6 months ago

Updated by mkittler 6 months ago

Updated by jbaier_cz 6 months ago

Updated by jbaier_cz 6 months ago

Updated by jbaier_cz 5 months ago

Updated by jbaier_cz 5 months ago