action #122311: Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver - QA (public) - openSUSE Project Management Tool

Actions

action #122311

open

coordination #99303: [saga][epic] Improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver

Added by okurz over 2 years ago. Updated over 2 years ago.

Status:

Feedback

Priority:

Normal

Assignee:

mgrifalconi

Target version:

future

Start date:

2022-12-21

Due date:

% Done:

Estimated time:

Description

Motivation¶

See #97118#note-10. qem-dashboard seems to have sometimes inconsistent results. Maybe instead of relying on the content in the dashboard database we should change places like https://github.com/openSUSE/qem-bot/blob/a3701ce5b9874f3552cf6bd2c98ae5a52963ab49/openqabot/approver.py#L101 to look at the most recent results in openQA directly.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by okurz over 2 years ago

Copied from action #122308: Handle invalid openQA job references in qem-dashboard size:M added

Actions

Copy link

Updated by mgrifalconi over 2 years ago

Assignee set to mgrifalconi

Actions

Copy link

Updated by mgrifalconi over 2 years ago

Status changed from New to In Progress

Some status update after trying to learn how bot/dashboard work and playing around with the code a bit.

This is what I would consider a dangerous change/refactor, since this tool is handling approval of updates to our customers and we currently don´t have a staging environment.

Considering that, I propose to develop a module with the new feature, with 3 settings:

dry run and log, to test out the new way of handling things, compare performance and make sure the same update requests are approved/rejected
emergency shutoff: this quickly disables the new code, without need of PRs approval/fixes and minimizing impact on production
enabled, switch over to the new system, eventually this would become default, the switch would go away and the old method removed from the code

Uploaded the WIP code https://github.com/michaelgrifalconi/qem-bot/commit/dc2ca5e9b03ad5f06ace62092512be00fd99b7fe but in the meantime found an issue.

It appears difficult to change the approver logic to only look at live real data.

This is the current flow for release request approvals:
(only looking at aggregate now)

_approvable calls get_incidents_approver to query the dashboard about current update (inc and rr numbers)
- easy to replicate here
then _approvable gets get_aggregate_settings on qem to query dashboard on some data that do not come directly from openqa/smelt
- not easy, since raw data received from openqa and data from dashboard is different. Dashboard is doing some work on that data as soon as it arrives: see https://github.com/openSUSE/qem-dashboard/blob/ebfeada7f6198ffc109ff8eb34a90ad8f49bd572/lib/Dashboard/Model/Incidents.pm#L252-L287
based on that data, get_incident_result will query the dashboard once again to get test results
- not yet looked into

In short, it's not a straight forward change since it's not just about reusing the same functions that feeds data to the dashboard and use directly that data.

I think it would be worth to separate what collects/process data, execute actions (bot) and the what visualizes that data (dashboard). I agree to query data in different way to better visualize but I personally don't like the bot to strongly rely on it and have the some data processing in the bot and some in the dashboard.

We could argue about the performance reasons to have the bot use cached dashboard data or not (or have one option with a fail over to the other one) but I would like to have the dashboard not be another business-critical component if we can avoid it and simply keep that burden/risk only on the bot script.

Actions

Copy link

Updated by mgrifalconi over 2 years ago

Copied to action #123286: Bot and dashboard reference to wrong data and block update approval size:M added

Actions

Copy link

Updated by mgrifalconi over 2 years ago

Copied to deleted (action #123286: Bot and dashboard reference to wrong data and block update approval size:M)

Actions

Copy link

Updated by mgrifalconi over 2 years ago

Related to action #123286: Bot and dashboard reference to wrong data and block update approval size:M added

Actions

Copy link

Updated by mgrifalconi over 2 years ago

Status changed from In Progress to Feedback

With the help of the discussion here, https://suse.slack.com/archives/C02CANHLANP/p1675927786104149?thread_ts=1674639850.741499&cid=C02CANHLANP
I got a better understanding of the architecture of the bot, and seems tricky to make the bot use only live data for approvals, since there are a lot of steps where data is fetched, stored to dashboard, downloaded from dashboard.
There is also no interest from dashboard/bot maintainers to switch to such logic.

What we can still do, is to rely on dashboard data until just before taking the approve/not-approve decision and not before. This would make the logic even more complicated IMHO because we first look at dashboard data(download live data, upload to dashboard, read from dashboard - all this multiple times for different kind of data), and then in the end we would double check on openqa data.
I am not a huge fan of this approach, but could still help in situations like the issue linked and see no other solution since a refactor to only use live data is not possible for reason mentioned before.

Before continuing on that, I would like some feedback from the tools team, to make sure this mixed approach can be something we can try and is worth investing some time.
Also available for a call if needed, just ping me :)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public)

Tags

Custom queries

action #122311

Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver

Motivation¶

Updated by okurz over 2 years ago

Updated by mgrifalconi over 2 years ago

Updated by mgrifalconi over 2 years ago

Updated by mgrifalconi over 2 years ago

Updated by mgrifalconi over 2 years ago

Updated by mgrifalconi over 2 years ago

Updated by mgrifalconi over 2 years ago