Project

General

Profile

Actions

action #126551

closed

coordination #126167: [epic][qem-bot] Inconsistent job counts in qem-dashboard size:M

[qem-bot] Flag missing openQA jobs with qem-dashboard API size:M

Added by kraih 11 months ago. Updated 8 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Target version:
Start date:
2023-03-23
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Followup to #126548. Once an API endpoint exists in the qem-dashboard for flagging openQA jobs as missing, qem-bot should start using it.

Acceptance criteria

  • AC1: qem-bot flags missing openQA jobs as such in the qem-dashboard

Suggestions

  • Missing openQA jobs already show up in the qem-bot pipeline logs. Extend that to notify the qem-dashboard using the new API endpoint
  • Find examples in gitlab CI pipeline runs where openQA jobs are missing to have a starting point for a test and verification
  • Verify the result with a quick SQL query on the dashboard's database for flagged missing jobs

Related issues 1 (0 open1 closed)

Related to QA - action #126548: [qem-dashboard] Add an API endpoint to flag openQA jobs as missing in openQA size:MResolvedkraih2023-03-232023-04-11

Actions
Actions #1

Updated by kraih 11 months ago

  • Related to action #126548: [qem-dashboard] Add an API endpoint to flag openQA jobs as missing in openQA size:M added
Actions #2

Updated by kraih 11 months ago

Blocked by #126548.

Actions #3

Updated by kraih 11 months ago

  • Status changed from Blocked to New
  • Target version changed from future to Ready

Unblocked.

Actions #4

Updated by kraih 11 months ago

The dashboard endpoint to use is PATCH /api/jobs/<job_id>: https://github.com/openSUSE/qem-dashboard/blob/main/API.md#openqa-jobs

Actions #5

Updated by okurz 11 months ago

  • Target version changed from Ready to future
Actions #6

Updated by okurz 10 months ago

  • Target version changed from future to Ready
Actions #7

Updated by okurz 10 months ago

  • Subject changed from [qem-bot] Flag missing openQA jobs with qem-dashboard API to [qem-bot] Flag missing openQA jobs with qem-dashboard API size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #8

Updated by mkittler 9 months ago

  • Assignee set to mkittler
Actions #9

Updated by mkittler 9 months ago

  • Status changed from Workable to In Progress
Actions #10

Updated by mkittler 9 months ago

  • Status changed from In Progress to Feedback
Actions #11

Updated by okurz 9 months ago

  • Due date set to 2023-06-16
Actions #12

Updated by livdywan 9 months ago

Discussed in the Unlock:

  • Reproduce locally in the same ubuntu:latest as run in GHA or use github codespaces to bisect
  • Change the mocked use of osd to something else to be sure that it definitily never resolves
  • Tina volunteered to try and locally break it
Actions #13

Updated by mkittler 9 months ago

We've established that this is not a problem caused by my changes but rather incompatible versions of certain Python modules being used in the CI. I'll try to switch to using Tumbleweed for CI runs.

Actions #14

Updated by mkittler 9 months ago

This PR will fix the CI: https://github.com/openSUSE/qem-bot/pull/120

Then I can rebase my other PR on it and finally get it merged.

Actions #16

Updated by livdywan 9 months ago

  • Due date changed from 2023-06-16 to 2023-06-23

mkittler wrote:

This PR will fix the CI: https://github.com/openSUSE/qem-bot/pull/120

Then I can rebase my other PR on it and finally get it merged.

Merged. Last wait another week to see if it actually works

Actions #17

Updated by mkittler 8 months ago

So far there are no jobs flagged as obsolete/missing:

ssh root@dashboard.qam.suse.de
machinectl shell postgresql
sudo -u postgres psql dashboard_db
dashboard_db=# select count(id) from openqa_jobs where obsolete = true;
 count 
-------
     0
(1 row)
Actions #18

Updated by mkittler 8 months ago

Now some openQA jobs have been flagged:

dashboard_db=# select concat('https://openqa.suse.de/tests/', job_id) from openqa_jobs where obsolete = true;
                concat                 
---------------------------------------
 https://openqa.suse.de/tests/11392753
 https://openqa.suse.de/tests/11377163
 https://openqa.suse.de/tests/11377159
 https://openqa.suse.de/tests/11360616
 https://openqa.suse.de/tests/11374924
 https://openqa.suse.de/tests/11395877
 https://openqa.suse.de/tests/11377130
 https://openqa.suse.de/tests/11381027
 https://openqa.suse.de/tests/11392665
 https://openqa.suse.de/tests/11395867
 https://openqa.suse.de/tests/11396541
 https://openqa.suse.de/tests/11377151
 https://openqa.suse.de/tests/11392660
 https://openqa.suse.de/tests/11392666
 https://openqa.suse.de/tests/11395868
(15 rows)

So I guess it generally works. It is just strange that this list of jobs also contains jobs that definitely do exists, e.g. https://openqa.suse.de/tests/11395868. The openQA comments API (which is used by the bot and a 404 reply would lead to flagging) also returns a 200 response for this job (via https://openqa.suse.de/api/v1/jobs/11395868/comments). Any ideas why this could be the case?

Actions #19

Updated by mkittler 8 months ago

I could also spot relevant lines in the bot's logs, e.g. https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/1644866:

2023-06-20 11:04:43 INFO     SUSE:Maintenance:29169:300779 has at least one failed job in aggregate tests
2023-06-20 11:04:43 INFO     Job setting 2016724 not found for incident 29270
2023-06-20 11:04:43 INFO     Job 11397618 not found in openQA, marking as obsolete on dashboard
2023-06-20 11:04:43 INFO     Found failed, not-ignored job https://openqa.suse.de/t11397618 for incident 29270
2023-06-20 11:04:43 INFO     SUSE:Maintenance:29270:301195 has at least one failed job in incident tests
2023-06-20 11:04:43 INFO     Job setting 2016725 not found for incident 29280
…
023-06-20 11:04:59 INFO     SUSE:Maintenance:29403:301154 has at least one failed job in aggregate tests
2023-06-20 11:05:00 INFO     Job 11397617 not found in openQA, marking as obsolete on dashboard
2023-06-20 11:05:00 INFO     Found failed, not-ignored job https://openqa.suse.de/t11397617 for incident 29407

Those jobs really don't exist. They do not appear in the list of my previous comment but I've just executed the query again and now they are there as well:

dashboard_db=# select concat('https://openqa.suse.de/tests/', job_id) from openqa_jobs where obsolete = true order by job_id;
                concat                 
---------------------------------------
 https://openqa.suse.de/tests/11360616
 https://openqa.suse.de/tests/11374924
 https://openqa.suse.de/tests/11377130
 https://openqa.suse.de/tests/11377151
 https://openqa.suse.de/tests/11377159
 https://openqa.suse.de/tests/11377163
 https://openqa.suse.de/tests/11381027
 https://openqa.suse.de/tests/11392665
 https://openqa.suse.de/tests/11392666
 https://openqa.suse.de/tests/11395906
 https://openqa.suse.de/tests/11395914
 https://openqa.suse.de/tests/11396541
 https://openqa.suse.de/tests/11396786
 https://openqa.suse.de/tests/11397615
 https://openqa.suse.de/tests/11397616
 https://openqa.suse.de/tests/11397617
 https://openqa.suse.de/tests/11397618
 https://openqa.suse.de/tests/11397623
 https://openqa.suse.de/tests/11397626
 https://openqa.suse.de/tests/11397627
(20 rows)
Actions #20

Updated by okurz 8 months ago

  • Status changed from Feedback to In Progress

as discussed in the unblock please reach out in #eng-testing or #discuss-maintenance and unless you receive horrible backlash consider the work done

Actions #21

Updated by mkittler 8 months ago

  • Status changed from In Progress to Feedback

I sent as message yesterday (to #discuss-maintenance) but haven't gotten a response yet.

Actions #22

Updated by okurz 8 months ago

  • Due date deleted (2023-06-23)
  • Status changed from Feedback to Resolved

I consider no response as "no horrible backlash" :)

Actions

Also available in: Atom PDF