action #119746: [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S - openQA Project (public) - openSUSE Project Management Tool

Actions

Copy link

action #119746

closed

[spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S

Added by okurz over 2 years ago. Updated over 1 year ago.

Status:

Resolved

Priority:

Normal

Assignee:

kraih

Category:

Feature requests

Target version:

Ready

Start date:

2022-11-02

Due date:

% Done:

Estimated time:

Description

Motivation¶

Also see #118639. The idea is to be able to find openQA jobs that need review as they block approval of maintenance updates. To find jobs needing review we can already use the "todo" checkbox on /tests but we also need to filter out jobs in "development" or outside any job group as well as match on either the job group name, e.g. match "Core Maintenance" or match on not "Kernel Maintenance".

Acceptance criteria¶

AC1: Proof of concept spike solution exists showing how one can filter "by review squad" over the UI

Suggestions¶

Proof of concept spike solution to filter jobs on /tests based on job group and job name. To be most flexible regex should be supported including also negative matches to exclude job names or job groups
As desired ask usual reviewers for how they would look for "their job"

Related issues 3 (0 open — 3 closed)

Actions

Copy link

Updated by okurz over 2 years ago

Related to action #117655: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does size:M added

Actions

Copy link

Updated by okurz over 2 years ago

Status changed from New to Blocked
Assignee set to okurz

#117655 first

Actions

Copy link

Updated by okurz over 2 years ago

Status changed from Blocked to New
Assignee deleted (~~okurz~~)
% Done changed from 30 to 0

Actions

Copy link

Updated by okurz over 2 years ago

Target version changed from Ready to future

Actions

Copy link

Updated by mgrifalconi over 2 years ago

Hello, we were thinking on something similar but on the dashboard.qam.suse.de since it feels the point of integration between openQA and SMELT. But any place is good, important is to fullfill the goal :)

As a reviewer (for my squad or openQA review task) I would like to see:

all failures related to a squad
ordered by how many release requests they are blocking (to help prioritize)
for how many runs (or days) they have been failing

From that view, we can also start collecting long term metrics, maybe with a CI in gitlab and feed into Grafana
Ideas of metrics:

failures/day
days before a failure is fixed
number of release requests blocked
morning vs evening failures (how much manual work was done to improve situation)

What do you think? How can we contribute?

Actions

Copy link

Updated by szarate over 2 years ago

Parent task changed from #114929 to #118639

Changing the parent task to reflect the reality better

Actions

Copy link

Updated by okurz almost 2 years ago

Target version changed from future to Ready

I discussed this with szarate, PO of QE-Core, today and I feel it's the right time that we try to look into this.

Actions

Copy link

Updated by okurz almost 2 years ago

Subject changed from [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" to [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S
Description updated (diff)
Status changed from New to Workable

Actions

Copy link

#10

Updated by kraih over 1 year ago

Assignee set to kraih

Having a hard time deciding if this should be an openQA or dashboard feature, so some more clarification is needed. How exactly can we connect a review squad to an openQA job in the data? We have information like job group names and incidents in the metadata, but how do we match that to specific squads? Is that information present in the data at all yet?

Actions

Copy link

#11

Updated by szarate over 1 year ago

Adding Michael as a watcher,

Right now, we don't have a mapping per see; there's an attempt done by Michael that tries to tackle the issue, however I think nothing keeps us from using the metadata project to map jobgroups to teams (which could backfire) or reuse the qam_jobgroups either by recompiling once the file changes or reading the file directly from the repository (we'd have to add a new property to the yaml), so no jobgroup will be left without an owner.

Actions

Copy link

#12

Updated by okurz over 1 year ago

kraih wrote in #note-10:

Having a hard time deciding if this should be an openQA or dashboard feature, so some more clarification is needed.

This is a clear openQA feature as it is described. But this is also why it's a timeboxed task. For example if we end up with a good understanding why it's not possible or why it does not make sense to have something like this in openQA then we can can still consider a feature for the qem-dashboard

How exactly can we connect a review squad to an openQA job in the data?

In a first approach just what the first ticket suggestion says: "filter jobs on /tests based on job group and job name", with a regex

We have information like job group names and incidents in the metadata, but how do we match that to specific squads? Is that information present in the data at all yet?

Don't consider the SUSE QAM metadata for this. That would be something different.

What szarate said could be something like "filter by regex match on job group description". Then anybody could put any custom field there and match on it

Actions

Copy link

#13

Updated by kraih over 1 year ago

okurz wrote in #note-12:

This is a clear openQA feature as it is described. But this is also why it's a timeboxed task. For example if we end up with a good understanding why it's not possible or why it does not make sense to have something like this in openQA then we can can still consider a feature for the qem-dashboard

Yes, i'm trying to understand what the users actually need here, and this is starting to look like two separate features (for followup tickets). For the openQA side we don't really want to introduce a "squad" concept, which is very SUSE org specific, just to be able to search for unreviewed jobs. This could theoretically already be done with the existing data, as you said, with the TODO option and some regex matching of job group names. That's what i'll prototype for this timeboxed ticket.

The second feature is potentially for the dashboard, since it is about displaying the status of active incidents. The users would also like to have an overview of which squads are currently blocking which incidents. Squads can once again be identified from job group names here, but we do not yet store enough openQA job metadata to know which jobs have been reviewed. The qem-bot retrieves openQA job comments and parses them to get that information, but does not forward it to the dashboard yet. Theoretically that would be all we need to make per squad blocked incident pages for the dashboard.

The squad/job group name mapping data from existing tooling:

var squadNames = []string{"SAP/HA", "Kernel", "QAC", "Yast", "Security", "Core"}
var squads = map[string][]string{
	"SAP/HA":   {"SAP", " HA"},
	"Kernel":   {"Kernel", " HPC"},
	"QAC":      {"Public", "Containers", "JeOS", "Micro", "Wicked", "SLEM"},
	"Yast":     {"YaST"},
	"Security": {"Security"},
	"Core":     {"Core", "TERADATA", "SLE 15", "SLE 12"},
}

Actions

Copy link

#14

Updated by mgrifalconi over 1 year ago

we do not yet store enough openQA job metadata to know which jobs have been reviewed.

Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.

As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:

fixing the test issue and getting the job green
softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression
asking to reject the RR

So I would not want to hide "reviewed" failures to squads, since they bother us (and maintenance coordination) as much as a "unreviewed" failure

Actions

Copy link

#15

Updated by kraih over 1 year ago

mgrifalconi wrote in #note-14:

Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.

As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:

fixing the test issue and getting the job green

softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression

asking to reject the RR

That is good to know. Then theoretically we should already have enough data and it's just a matter of deciding how to present it on the dashboard with squad specific pages. Good start would probably be to introduce the "squad" concept on the "Blocked" page and to allow filtering by squad. And perhaps to expose squad data in the dashboard API so the mapping data doesn't have to maintained in multiple places.

Actions

Copy link

#16

Updated by szarate over 1 year ago

kraih wrote in #note-15:

mgrifalconi wrote in #note-14:

Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.

As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:

fixing the test issue and getting the job green

softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression

asking to reject the RR

That is good to know. Then theoretically we should already have enough data and it's just a matter of deciding how to present it on the dashboard with squad specific pages. Good start would probably be to introduce the "squad" concept on the "Blocked" page and to allow filtering by squad. And perhaps to expose squad data in the dashboard API so the mapping data doesn't have to maintained in multiple places.

I think that the Squad concept can be mapped to review groups, teams, review teams or user groups... after all, squads is more of an arbitrary concept that comes from Agile at Scale than for how systems are built.

Here I mean that job groups could be assigned to a group, and a user could belong to a group, however bringing the stuff into the dashboard itself has its own advantages... like making #127208 obsolete :)

Actions

Copy link

#17

Updated by kraih over 1 year ago

Here I mean that job groups could be assigned to a group, and a user could belong to a group, however bringing the stuff into the dashboard itself has its own advantages... like making #127208 obsolete :)

That works too, and would be fairly trivial to implement if we reuse id.opensuse.org for identity management again. Query parameters for existing filters would probably still be a good idea to have though.

Actions

Copy link

#18