action #122311: Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver - QA (public) - openSUSE Project Management Tool

Custom queries

may-sprint
new immediate tickets
openQA Infrastructure Project
openqa-review - Closed tickets last updated by openqa-review, last 30 days
QA roadmap long-term
QA SLE functional
QA SLE Functional - closed in last 14 days
QA SLE Functional - High, need to be refined
QA SLE Functional - over cycle time median
QA SLE u
QA SLE y
QA tools (tag not necessary in openQA and subprojects)
QA tools tag (tag not necessary in openQA and subprojects; excluding tickets in "Ready" version as they are already on the backlog)
QAC - Backlog
QE tools team - backlog (dev)
QE tools team - backlog (ready issues)
QE tools team - backlog SLA high
QE tools team - backlog SLA immediate
QE tools team - backlog SLA no immediate/urgent in feedback/blocked
QE tools team - backlog SLA normal
QE tools team - backlog SLA urgent
QE tools team - backlog SLO high
QE tools team - backlog SLO normal
QE tools team - backlog SLO urgent
QE tools team - backlog, high-level view (epics and higher)
QE tools team - backlog, non-reactive work, needs parent
QE tools team - backlog, top-level view (all sagas)
QE tools team - closed within last 14 days
QE tools team - closed within last 60 days
QE tools team - closed yesterday
QE Tools Team - Collaborative Session
QE tools team - due date forecast
QE tools team - exceeding due-date
QE tools team - infrastructure backlog
QE tools team - next - sorted by update time
QE tools team - next issues
QE tools team - non-estimated (unblocked) issues (dev)
QE tools team - non-estimated (unblocked) issues (infra)
QE tools team - ready issues - Workable
QE tools team - ready, not assigned/blocked/low
QE tools team - SLO high forecast
QE tools team - triage: untriaged issues
QE tools team - update forecast
QE tools team - updated by priority
QE tools team - what members of the team are working on - Feedback (not-low)
QE Tools Team Backlog By Assignee
Tools Team Retrospective
Tools Team Retrospective (not estimated or assigned)

Actions

action #122311

open

coordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver

Added by okurz about 2 years ago. Updated almost 2 years ago.

Status:

Feedback

Priority:

Normal

Assignee:

mgrifalconi

Target version:

future

Start date:

2022-12-21

Due date:

% Done:

Estimated time:

Description

Motivation¶

See #97118#note-10. qem-dashboard seems to have sometimes inconsistent results. Maybe instead of relying on the content in the dashboard database we should change places like https://github.com/openSUSE/qem-bot/blob/a3701ce5b9874f3552cf6bd2c98ae5a52963ab49/openqabot/approver.py#L101 to look at the most recent results in openQA directly.

Related issues 2 (0 open — 2 closed)

Related to QA (public) - action #123286: Bot and dashboard reference to wrong data and block update approval size:M

Resolved

kraih

2022-12-21

Actions

Copied from QA (public) - action #122308: Handle invalid openQA job references in qem-dashboard size:M

Resolved

jbaier_cz

2022-12-21

Actions

Issue # Delay: days Cancel

History
Notes
Property changes

Actions

Copy link

Updated by okurz about 2 years ago

Copied from action #122308: Handle invalid openQA job references in qem-dashboard size:M added

Actions

Copy link

Updated by mgrifalconi about 2 years ago

Assignee set to mgrifalconi

Actions

Copy link

Updated by mgrifalconi about 2 years ago

Status changed from New to In Progress

Some status update after trying to learn how bot/dashboard work and playing around with the code a bit.

This is what I would consider a dangerous change/refactor, since this tool is handling approval of updates to our customers and we currently don´t have a staging environment.

Considering that, I propose to develop a module with the new feature, with 3 settings:

dry run and log, to test out the new way of handling things, compare performance and make sure the same update requests are approved/rejected
emergency shutoff: this quickly disables the new code, without need of PRs approval/fixes and minimizing impact on production
enabled, switch over to the new system, eventually this would become default, the switch would go away and the old method removed from the code

Uploaded the WIP code https://github.com/michaelgrifalconi/qem-bot/commit/dc2ca5e9b03ad5f06ace62092512be00fd99b7fe but in the meantime found an issue.

It appears difficult to change the approver logic to only look at live real data.

This is the current flow for release request approvals:
(only looking at aggregate now)

_approvable calls get_incidents_approver to query the dashboard about current update (inc and rr numbers)
- easy to replicate here
then _approvable gets get_aggregate_settings on qem to query dashboard on some data that do not come directly from openqa/smelt
- not easy, since raw data received from openqa and data from dashboard is different. Dashboard is doing some work on that data as soon as it arrives: see https://github.com/openSUSE/qem-dashboard/blob/ebfeada7f6198ffc109ff8eb34a90ad8f49bd572/lib/Dashboard/Model/Incidents.pm#L252-L287
based on that data, get_incident_result will query the dashboard once again to get test results
- not yet looked into

In short, it's not a straight forward change since it's not just about reusing the same functions that feeds data to the dashboard and use directly that data.

I think it would be worth to separate what collects/process data, execute actions (bot) and the what visualizes that data (dashboard). I agree to query data in different way to better visualize but I personally don't like the bot to strongly rely on it and have the some data processing in the bot and some in the dashboard.

We could argue about the performance reasons to have the bot use cached dashboard data or not (or have one option with a fail over to the other one) but I would like to have the dashboard not be another business-critical component if we can avoid it and simply keep that burden/risk only on the bot script.

Actions

Copy link

Updated by mgrifalconi almost 2 years ago

Copied to action #123286: Bot and dashboard reference to wrong data and block update approval size:M added

Actions

Copy link

Updated by mgrifalconi almost 2 years ago

Copied to deleted (action #123286: Bot and dashboard reference to wrong data and block update approval size:M)

Actions

Copy link

Updated by mgrifalconi almost 2 years ago

Related to action #123286: Bot and dashboard reference to wrong data and block update approval size:M added

Actions

Copy link

Updated by mgrifalconi almost 2 years ago

Status changed from In Progress to Feedback

With the help of the discussion here, https://suse.slack.com/archives/C02CANHLANP/p1675927786104149?thread_ts=1674639850.741499&cid=C02CANHLANP
I got a better understanding of the architecture of the bot, and seems tricky to make the bot use only live data for approvals, since there are a lot of steps where data is fetched, stored to dashboard, downloaded from dashboard.
There is also no interest from dashboard/bot maintainers to switch to such logic.

What we can still do, is to rely on dashboard data until just before taking the approve/not-approve decision and not before. This would make the logic even more complicated IMHO because we first look at dashboard data(download live data, upload to dashboard, read from dashboard - all this multiple times for different kind of data), and then in the end we would double check on openqa data.
I am not a huge fan of this approach, but could still help in situations like the issue linked and see no other solution since a refactor to only use live data is not possible for reason mentioned before.

Before continuing on that, I would like some feedback from the tools team, to make sure this mixed approach can be something we can try and is worth investing some time.
Also available for a call if needed, just ping me :)

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

QA (public)

Tags

Custom queries

action #122311

Use live openQA test results instead of inconsistent qem-dashboard database in qem-bot approver

Motivation¶

Updated by okurz about 2 years ago

Updated by mgrifalconi about 2 years ago

Updated by mgrifalconi about 2 years ago

Updated by mgrifalconi almost 2 years ago

Updated by mgrifalconi almost 2 years ago

Updated by mgrifalconi almost 2 years ago

Updated by mgrifalconi almost 2 years ago