action #126551
closedcoordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:M
[qem-bot] Flag missing openQA jobs with qem-dashboard API size:M
0%
Description
Motivation¶
Followup to #126548. Once an API endpoint exists in the qem-dashboard for flagging openQA jobs as missing, qem-bot should start using it.
Acceptance criteria¶
- AC1: qem-bot flags missing openQA jobs as such in the qem-dashboard
Suggestions¶
- Missing openQA jobs already show up in the qem-bot pipeline logs. Extend that to notify the qem-dashboard using the new API endpoint
- Find examples in gitlab CI pipeline runs where openQA jobs are missing to have a starting point for a test and verification
- Verify the result with a quick SQL query on the dashboard's database for flagged missing jobs
Updated by kraih almost 2 years ago
- Related to action #126548: [qem-dashboard] Add an API endpoint to flag openQA jobs as missing in openQA size:M added
Updated by kraih almost 2 years ago
- Status changed from Blocked to New
- Target version changed from future to Ready
Unblocked.
Updated by kraih almost 2 years ago
The dashboard endpoint to use is PATCH /api/jobs/<job_id>
: https://github.com/openSUSE/qem-dashboard/blob/main/API.md#openqa-jobs
Updated by okurz over 1 year ago
- Subject changed from [qem-bot] Flag missing openQA jobs with qem-dashboard API to [qem-bot] Flag missing openQA jobs with qem-dashboard API size:M
- Description updated (diff)
- Status changed from New to Workable
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
Updated by livdywan over 1 year ago
Discussed in the Unlock:
- Reproduce locally in the same ubuntu:latest as run in GHA or use github codespaces to bisect
- Change the mocked use of osd to something else to be sure that it definitily never resolves
- Tina volunteered to try and locally break it
Updated by mkittler over 1 year ago
We've established that this is not a problem caused by my changes but rather incompatible versions of certain Python modules being used in the CI. I'll try to switch to using Tumbleweed for CI runs.
Updated by mkittler over 1 year ago
This PR will fix the CI: https://github.com/openSUSE/qem-bot/pull/120
Then I can rebase my other PR on it and finally get it merged.
Updated by okurz over 1 year ago
Updated by livdywan over 1 year ago
- Due date changed from 2023-06-16 to 2023-06-23
mkittler wrote:
This PR will fix the CI: https://github.com/openSUSE/qem-bot/pull/120
Then I can rebase my other PR on it and finally get it merged.
Merged. Last wait another week to see if it actually works
Updated by mkittler over 1 year ago
So far there are no jobs flagged as obsolete/missing:
ssh root@dashboard.qam.suse.de
machinectl shell postgresql
sudo -u postgres psql dashboard_db
dashboard_db=# select count(id) from openqa_jobs where obsolete = true;
count
-------
0
(1 row)
Updated by mkittler over 1 year ago
Now some openQA jobs have been flagged:
dashboard_db=# select concat('https://openqa.suse.de/tests/', job_id) from openqa_jobs where obsolete = true;
concat
---------------------------------------
https://openqa.suse.de/tests/11392753
https://openqa.suse.de/tests/11377163
https://openqa.suse.de/tests/11377159
https://openqa.suse.de/tests/11360616
https://openqa.suse.de/tests/11374924
https://openqa.suse.de/tests/11395877
https://openqa.suse.de/tests/11377130
https://openqa.suse.de/tests/11381027
https://openqa.suse.de/tests/11392665
https://openqa.suse.de/tests/11395867
https://openqa.suse.de/tests/11396541
https://openqa.suse.de/tests/11377151
https://openqa.suse.de/tests/11392660
https://openqa.suse.de/tests/11392666
https://openqa.suse.de/tests/11395868
(15 rows)
So I guess it generally works. It is just strange that this list of jobs also contains jobs that definitely do exists, e.g. https://openqa.suse.de/tests/11395868. The openQA comments API (which is used by the bot and a 404 reply would lead to flagging) also returns a 200 response for this job (via https://openqa.suse.de/api/v1/jobs/11395868/comments). Any ideas why this could be the case?
Updated by mkittler over 1 year ago
I could also spot relevant lines in the bot's logs, e.g. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1644866:
2023-06-20 11:04:43 INFO SUSE:Maintenance:29169:300779 has at least one failed job in aggregate tests
2023-06-20 11:04:43 INFO Job setting 2016724 not found for incident 29270
2023-06-20 11:04:43 INFO Job 11397618 not found in openQA, marking as obsolete on dashboard
2023-06-20 11:04:43 INFO Found failed, not-ignored job https://openqa.suse.de/t11397618 for incident 29270
2023-06-20 11:04:43 INFO SUSE:Maintenance:29270:301195 has at least one failed job in incident tests
2023-06-20 11:04:43 INFO Job setting 2016725 not found for incident 29280
…
023-06-20 11:04:59 INFO SUSE:Maintenance:29403:301154 has at least one failed job in aggregate tests
2023-06-20 11:05:00 INFO Job 11397617 not found in openQA, marking as obsolete on dashboard
2023-06-20 11:05:00 INFO Found failed, not-ignored job https://openqa.suse.de/t11397617 for incident 29407
Those jobs really don't exist. They do not appear in the list of my previous comment but I've just executed the query again and now they are there as well:
dashboard_db=# select concat('https://openqa.suse.de/tests/', job_id) from openqa_jobs where obsolete = true order by job_id;
concat
---------------------------------------
https://openqa.suse.de/tests/11360616
https://openqa.suse.de/tests/11374924
https://openqa.suse.de/tests/11377130
https://openqa.suse.de/tests/11377151
https://openqa.suse.de/tests/11377159
https://openqa.suse.de/tests/11377163
https://openqa.suse.de/tests/11381027
https://openqa.suse.de/tests/11392665
https://openqa.suse.de/tests/11392666
https://openqa.suse.de/tests/11395906
https://openqa.suse.de/tests/11395914
https://openqa.suse.de/tests/11396541
https://openqa.suse.de/tests/11396786
https://openqa.suse.de/tests/11397615
https://openqa.suse.de/tests/11397616
https://openqa.suse.de/tests/11397617
https://openqa.suse.de/tests/11397618
https://openqa.suse.de/tests/11397623
https://openqa.suse.de/tests/11397626
https://openqa.suse.de/tests/11397627
(20 rows)
Updated by okurz over 1 year ago
- Status changed from Feedback to In Progress
as discussed in the unblock please reach out in #eng-testing or #discuss-maintenance and unless you receive horrible backlash consider the work done
Updated by mkittler over 1 year ago
- Status changed from In Progress to Feedback
I sent as message yesterday (to #discuss-maintenance) but haven't gotten a response yet.
Updated by okurz over 1 year ago
- Due date deleted (
2023-06-23) - Status changed from Feedback to Resolved
I consider no response as "no horrible backlash" :)