Project

General

Profile

Actions

action #161267

open

coordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

coordination #155671: [epic] Better handling of SLE maintenance test review

Represent the effect of "@review:acceptable_for" labels on qem-dashboard pages size:M

Added by okurz 10 months ago. Updated 7 days ago.

Status:
Feedback
Priority:
Low
Assignee:
Start date:
2024-05-30
Due date:
2025-04-30 (Due in 27 days)
% Done:

0%

Estimated time:

Description

Motivation

https://github.com/openSUSE/qem-dashboard/ on pages like http://dashboard.qam.suse.de/blocked shows which updates are blocked but to find more details what decision qem-bot does and why certain incidents are still blocked it can be multiple clicks and digging through log files of qem-bot to find out the current status in details, in particular when using @review:acceptable_for:incident_X labels on openQA jobs which still represent as simply red boxes on the dashboard. For making the life of reviewers easier the dashboard should be able to provide more details, at best on all levels accordingly.

Additional explanation by mgrifalconi:

I fear you see something only the bot logs https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs for approve incident jobs
You should see something like
"Ignoring failed job %s for incident %s due to openQA comment"
https://github.com/openSUSE/qem-bot/blob/463d5de225767791d0b6815435a0b341ce319c67/openqabot/approver.py#L261
BUT only if there is no other genuine failure "found" by the bot first. In that case, the bot will give up at first non-ignored failure and won't look at other acceptable ones.
Or you can see that it goes out of the queue and gets approved eventually :smile:
Would be nice to reflect that on the dashboard though, since at first glance it looks like something was not reviewed when instead it was.

Ideas

I am thinking of two things to do:

  1. Provide the last relevant log lines referring to each incident on the page for each incident of the dashboard
  2. use another pattern and color for the "acceptable_for" result. I am thinking of a yellow-green dashed pattern in the background of each result box

Acceptance criteria

Suggestions


Files

20250313_14h40m37s_grim.png (38.7 KB) 20250313_14h40m37s_grim.png jbaier_cz, 2025-03-13 13:42

Related issues 1 (0 open1 closed)

Related to openQA Infrastructure (public) - action #179503: »bot-ng | Failed pipeline for master« due to gpg issuesResolvedokurz

Actions
Actions #1

Updated by okurz 10 months ago

  • Parent task set to #155671
Actions #2

Updated by pcervinka 8 months ago

@okurz I understand that tools team is busy, but be would be possible to increase priority from low to something higher? This dashboard visibility issue makes regular confusion about maintenance update status.

Actions #3

Updated by okurz 8 months ago

  • Priority changed from Low to Normal
Actions #4

Updated by pvorel 8 months ago

Hint from Oliver (https://suse.slack.com/archives/C02DQJKULE4/p1722327940699429?thread_ts=1722235479.858789&cid=C02DQJKULE4):

https://github.com/openSUSE/qem-bot/blob/463d5de225767791d0b6815435a0b341ce319c67/openqabot/approver.py#L129 is where the "acceptable_for" comment in qem-bot is parsed and evaluated. The challenge is that this is called during the "approval" command which is not syncing anything to the dashboard (yet).

Actions #5

Updated by okurz 7 months ago

  • Target version changed from future to Tools - Next
Actions #6

Updated by okurz 7 months ago

  • Subject changed from Represent the effect of "@review:acceptable_for" labels on qem-dashboard pages to Represent the effect of "@review:acceptable_for" labels on qem-dashboard pages size:M
  • Description updated (diff)
  • Status changed from New to Workable
Actions #7

Updated by okurz 4 months ago

  • Target version changed from Tools - Next to Ready
Actions #8

Updated by okurz 4 months ago

  • Target version changed from Ready to Tools - Next
Actions #9

Updated by okurz 30 days ago

  • Target version changed from Tools - Next to Ready
Actions #10

Updated by mkittler 23 days ago

  • Status changed from Workable to In Progress
  • Assignee set to mkittler
Actions #11

Updated by mkittler 23 days ago · Edited

I guess we could sync whether each job is acceptable for what incidents like this: https://github.com/openSUSE/qem-bot/pull/190

This has the advantage of doing this kind of updating in the "sync-incidents" commands which seems to follow how this bot is designed.

However, it has the disadvantage that the dashboard needs to compute whether an incident is approvable or not based on this information we would update for individual openQA jobs. This would also require a probably non-trivial extension of the dashboard database schema to store this information. I'm also not sure whether it is a good idea to query (potentially a lot of) openQA comments in this place.

Maybe it would be better if we extended the "approve" command instead. It would not just approve or skip the approval but also communicate the approval/skip with the dashboard. This would require a bit of refactoring to so we keep track of non-approvable incidents and update the dashboard accordingly (instead of just filtering out non-approvable incidents from the set with some debug logging). The dashboard already has an API to update incidents. So we just need to add another column to the dashboard database schema, e.g. review_comment which would store why the incident was approved or not approved. This could be displayed as per AC1.

I still have to reverse-engineer how the "boxes" for each incident on http://dashboard.qam.suse.de/blocked are computed. It seems one box represents one or more openQA jobs, though. So in order to show whether an @review:acceptable_for is relevant for a "box" as per AC2 just knowing why an incident was approved or not is not sufficient. For this we needed to track this information per openQA job as I tried with https://github.com/openSUSE/qem-bot/pull/190.

So to fulfill both ACs we might need both:

  1. The bot tells the dashboard for each incident why it was approved or not (during the approval command) as some kind of review comment. This can be displayed on incident details pages (e.g. http://dashboard.qam.suse.de/incident/37356) like the approved and embargoed states.
  2. The bot tells the dashboard for each and every openQA job whether @review:acceptable_for is present (during the sync or the approval command depending on what is more efficient). The dashboard stores this openQA-job-to-incident relation in its database and takes it into account when rendering "boxes" on http://dashboard.qam.suse.de/blocked.

I have already started with 2. but perhaps 1. is easier and a better starting point.

Maybe it would also make sense to cross-check with @jbaier_cz and @okurz whether this goes into the right direction.

Actions #12

Updated by openqa_review 23 days ago

  • Due date set to 2025-03-26

Setting due date based on mean cycle time of SUSE QE Tools

Actions #13

Updated by jbaier_cz 21 days ago

I hope I understood the initial motivation correctly. Take a look at some example, at this moment 37819:libarchive failed in "Kernel Maintenance" and "Security Maintenance" job groups because there are 2 failed openQA tests there (the number should be clear from the boxes, 1 test in each job group). The bot will of course not approve that incident, because there are failed jobs. Now, let's imagine that the job from the Security is not related and a reviewer puts the @review:acceptable_for:37819 comment there. In the next run, the bot will still not approve the job (because there is still 1 other failed job); but the reviewer will still see 2 failed tests in the dashboard and will need to investigate both of them to find out one of them is already commented.

So I guess we can look at this issue as some kind of a soft-fail for openQA job maybe? The idea from the ticket is to make the red box yellow-green if the failed test has the comment (now the tricky part might be: what if there are more failed tests in one box and only some of them are "acceptable", probably a yellow-red box?).

I would guess that your approach no. 2 should be more than enough. It might be done in the approval phase (which has some drawbacks already mentioned in the description and the approval does not continue to look for other comments once it finds out the first non-commented failure) or the sync phase (where you might need to reuse the same code as it effectively needs to stay in the approval as well and with the drawback that the data in the dashboard will not be "the same" as during the approval, so it is not reflecting the the effect with 100% accuracy).

I do not think you need to deal with no. 1 as you can easily find out from the logs why was something approved and the only real issue is the "blocked" why (i.e. why was something not yet approved during the last approve run).

Actions #14

Updated by MDoucha 21 days ago

jbaier_cz wrote in #note-13:

I hope I understood the initial motivation correctly. Take a look at some example, at this moment 37819:libarchive failed in "Kernel Maintenance" and "Security Maintenance" job groups because there are 2 failed openQA tests there (the number should be clear from the boxes, 1 test in each job group). The bot will of course not approve that incident, because there are failed jobs. Now, let's imagine that the job from the Security is not related and a reviewer puts the @review:acceptable_for:37819 comment there. In the next run, the bot will still not approve the job (because there is still 1 other failed job); but the reviewer will still see 2 failed tests in the dashboard and will need to investigate both of them to find out one of them is already commented.

So I guess we can look at this issue as some kind of a soft-fail for openQA job maybe? The idea from the ticket is to make the red box yellow-green if the failed test has the comment (now the tricky part might be: what if there are more failed tests in one box and only some of them are "acceptable", probably a yellow-red box?).

For me, the only requirement is that jobs with @review:acceptable_for:... will be counted as passed for the given incident. Highlighting label use with fancy colors is not important.

Actions #15

Updated by okurz 14 days ago

  • Priority changed from Normal to Low
Actions #16

Updated by mkittler 13 days ago

  • Status changed from In Progress to Feedback
Actions #17

Updated by okurz 8 days ago

  • Due date deleted (2025-03-26)
Actions #18

Updated by okurz 8 days ago

  • Due date set to 2025-03-26

wait, the due date should actually stay. But I understand that your work was mostly delayed by other work and that this is a low ticket. Both PRs are merged and should be deployed. So time to check the effect?

Actions #19

Updated by livdywan 8 days ago

So do we have an example? As per AC2 I would expect to see this pretty clearly on http://dashboard.qam.suse.de/blocked but maybe there is no relevant case?

@MDoucha Would you have an example for us?

Actions #20

Updated by okurz 8 days ago

I called SQL select job_id,test,build,c.t_created,text from comments c join jobs j on j.id = c.job_id where text ~ 'acceptable_for' and build !~ ':' order by t_created desc limit 3; and found only

  job_id  |        test         |   build    |      t_created      |                        text

----------+---------------------+------------+---------------------+----------------------------------------
-------------
 17048481 | docker_tests        | 20250313-1 | 2025-03-14 12:07:43 | @review:acceptable_for:incident_37848:b
sc#1239303\r+
          |                     |            |                     |
 17048482 | docker-stable_tests | 20250313-1 | 2025-03-14 12:07:37 | @review:acceptable_for:incident_37848:b
sc#1239303\r+
          |                     |            |                     |
 17048481 | docker_tests        | 20250313-1 | 2025-03-14 12:07:18 | @review:acceptable_for:incident_37835:b
sc#1239303\r+
          |                     |            |                     |
(3 rows)

so no recent use of acceptable_for within aggregate tests. There are some uses in incident tests where it's not that useful but maybe can still be checked:

openqa=> select job_id,test,build,c.t_created,text from comments c join jobs j on j.id = c.job_id where text ~ 'acceptable_for' order by t_created desc limit 3;
  job_id  |            test            |       build       |      t_created      |
                         text
----------+----------------------------+-------------------+---------------------+--------------------------
---------------------------------------------------------------------------------
 17145817 | mau-extratests-virt-hyperv | :37973:dtb-armv7l | 2025-03-26 01:45:01 | @review:acceptable_for:in
cident_37973:automation issue, job pass in https://openqa.suse.de/tests/17106789
 17137273 | qam-klp_xfstests_btrfs     | :37973:dtb-armv7l | 2025-03-24 15:46:01 | @review:acceptable_for:in
cident_37973:known_issues
 17106981 | ltp_syscalls_m32           | :37973:dtb-armv7l | 2025-03-24 15:45:31 | @review:acceptable_for:in
cident_37973:bsc#1229648
(3 rows)
Actions #21

Updated by okurz 8 days ago

  • Related to action #179503: »bot-ng | Failed pipeline for master« due to gpg issues added
Actions #22

Updated by mkittler 8 days ago · Edited

The bot-ng changes cannot be deployed/used right now.

So select * from job_remarks limit 1; (see https://gitlab.suse.de/qa-maintenance/bot-ng/-/blob/master/Readme.md#cleanup-of-unwanted-test-results) still shows no results despite over 300 results for select count(c.job_id) from comments as c join jobs j on j.id = c.job_id where text ~ 'acceptable_for'; and Ignoring failed job logged on approve jobs.

Actions #23

Updated by okurz 8 days ago

https://gitlab.suse.de/qa-maintenance/bot-ng/-/jobs/4043079#L180 now ran with your latest qem-bot changes but no "job_remarks" in the db yet. What should we do next?

Actions #24

Updated by MDoucha 8 days ago

livdywan wrote in #note-19:

So do we have an example? As per AC2 I would expect to see this pretty clearly on http://dashboard.qam.suse.de/blocked but maybe there is no relevant case?

@MDoucha Would you have an example for us?

There are no pending kernel updates at the moment so we'll have to wait for the next update round in ~2 weeks.

Actions #25

Updated by okurz 7 days ago

  • Due date changed from 2025-03-26 to 2025-04-30

delayed by #179503 by some days and we should wait for further uses of "acceptable_for" so bumping due date accordingly.

Actions #26

Updated by mkittler 7 days ago · Edited

I checked a few recent approve jobs but there are still none with "Ignoring failed job" and there are also no remarks in the database. I'd just wait until something comes up.

EDIT: I will use sapworker2-sp.qe.nue2.suse.org which is powered off anyway to create many worker instances.

Actions

Also available in: Atom PDF