Actions
action #164733
closedcoordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release
coordination #155671: [epic] Better handling of SLE maintenance test review
qem-dashboard (and hence qem-bot) see a job as failed even though it's marked as softfailed since > 30 days in openQA size:M
Start date:
2024-07-31
Due date:
% Done:
0%
Estimated time:
Tags:
Description
Observation¶
https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2892821#L54 shows
2024-07-31 07:03:25 INFO Found failed, not-ignored job https://openqa.suse.de/t14779563 for incident 34532
even though in https://openqa.suse.de/tests/14779563 it's visible that the job was "force_result'd" as part of https://openqa.suse.de/tests/14779563#comment-1538601 already on 2024-07-02. The most recent "sync incidents" job https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/2892922 does not mention 14779563.
Expected result¶
- E1: http://dashboard.qam.suse.de/blocked?incident=34532&group_names=SP5 or the equivalent URL from the database should show no failed job
- E2: http://dashboard.qam.suse.de/incident/34532 should show no failed job
- E3: The latest "approve incidents" pipeline from https://gitlab.suse.de/qa-maintenance/bot-ng/-/pipeline_schedules should not mention https://openqa.suse.de/tests/14779563 as failed
Suggestions¶
- Ask test reviewers about examples
- Look for jobs that are softfailed via a force_result label and at the same time still failed on the dashboard.
- Check how the "sync incidents" works on already finished results. Maybe already finished results are only revisited if an AMQP event for a new comment is received and that event could have gone missed so the existing result is never revisited?
- To reproduce a qem-dashboard database dump is available on qam2.suse.de within the "postgresql" machine in /root/dashboard_db-2024-07-31T09:43:21+02:00.sql.xz
- Setup dashboard/bot/openQA locally and simulate how the dashboard/bot behave if an already finished job changes its result
- Look at the qem-bot code to check for any obvious problems with handling softfailed jobs
Actions