Project

General

Profile

Actions

action #119746

closed

[spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S

Added by okurz over 1 year ago. Updated 3 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
Feature requests
Target version:
Start date:
2022-11-02
Due date:
% Done:

0%

Estimated time:

Description

Motivation

Also see #118639. The idea is to be able to find openQA jobs that need review as they block approval of maintenance updates. To find jobs needing review we can already use the "todo" checkbox on /tests but we also need to filter out jobs in "development" or outside any job group as well as match on either the job group name, e.g. match "Core Maintenance" or match on not "Kernel Maintenance".

Acceptance criteria

  • AC1: Proof of concept spike solution exists showing how one can filter "by review squad" over the UI

Suggestions

  • Proof of concept spike solution to filter jobs on /tests based on job group and job name. To be most flexible regex should be supported including also negative matches to exclude job names or job groups
  • As desired ask usual reviewers for how they would look for "their job"

Related issues 3 (0 open3 closed)

Related to openQA Project - action #117655: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does size:MResolvedkraih2022-10-06

Actions
Related to openQA Tests - action #153646: [qe-core] Rename incident job groups owned by qe-coreResolvedmgrifalconi2024-01-16

Actions
Copied to openQA Project - action #134933: Filter openQA todo-jobs on /tests belonging to one "review squad" size:MResolvedkraih2022-11-02

Actions
Actions #2

Updated by okurz over 1 year ago

  • Related to action #117655: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does size:M added
Actions #3

Updated by okurz over 1 year ago

  • Status changed from New to Blocked
  • Assignee set to okurz

#117655 first

Actions #4

Updated by okurz over 1 year ago

  • Status changed from Blocked to New
  • Assignee deleted (okurz)
  • % Done changed from 30 to 0
Actions #5

Updated by okurz over 1 year ago

  • Target version changed from Ready to future
Actions #6

Updated by mgrifalconi over 1 year ago

Hello, we were thinking on something similar but on the dashboard.qam.suse.de since it feels the point of integration between openQA and SMELT. But any place is good, important is to fullfill the goal :)

As a reviewer (for my squad or openQA review task) I would like to see:

  • all failures related to a squad
  • ordered by how many release requests they are blocking (to help prioritize)
  • for how many runs (or days) they have been failing

From that view, we can also start collecting long term metrics, maybe with a CI in gitlab and feed into Grafana
Ideas of metrics:

  • failures/day
  • days before a failure is fixed
  • number of release requests blocked
  • morning vs evening failures (how much manual work was done to improve situation)

What do you think? How can we contribute?

Actions #7

Updated by szarate about 1 year ago

  • Parent task changed from #114929 to #118639

Changing the parent task to reflect the reality better

Actions #8

Updated by okurz 10 months ago

  • Target version changed from future to Ready

I discussed this with szarate, PO of QE-Core, today and I feel it's the right time that we try to look into this.

Actions #9

Updated by okurz 9 months ago

  • Subject changed from [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" to [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S
  • Description updated (diff)
  • Status changed from New to Workable
Actions #10

Updated by kraih 8 months ago

  • Assignee set to kraih

Having a hard time deciding if this should be an openQA or dashboard feature, so some more clarification is needed. How exactly can we connect a review squad to an openQA job in the data? We have information like job group names and incidents in the metadata, but how do we match that to specific squads? Is that information present in the data at all yet?

Actions #11

Updated by szarate 8 months ago

Adding Michael as a watcher,

Right now, we don't have a mapping per see; there's an attempt done by Michael that tries to tackle the issue, however I think nothing keeps us from using the metadata project to map jobgroups to teams (which could backfire) or reuse the qam_jobgroups either by recompiling once the file changes or reading the file directly from the repository (we'd have to add a new property to the yaml), so no jobgroup will be left without an owner.

Actions #12

Updated by okurz 8 months ago

kraih wrote in #note-10:

Having a hard time deciding if this should be an openQA or dashboard feature, so some more clarification is needed.

This is a clear openQA feature as it is described. But this is also why it's a timeboxed task. For example if we end up with a good understanding why it's not possible or why it does not make sense to have something like this in openQA then we can can still consider a feature for the qem-dashboard

How exactly can we connect a review squad to an openQA job in the data?

In a first approach just what the first ticket suggestion says: "filter jobs on /tests based on job group and job name", with a regex

We have information like job group names and incidents in the metadata, but how do we match that to specific squads? Is that information present in the data at all yet?

Don't consider the SUSE QAM metadata for this. That would be something different.

What szarate said could be something like "filter by regex match on job group description". Then anybody could put any custom field there and match on it

Actions #13

Updated by kraih 8 months ago

okurz wrote in #note-12:

This is a clear openQA feature as it is described. But this is also why it's a timeboxed task. For example if we end up with a good understanding why it's not possible or why it does not make sense to have something like this in openQA then we can can still consider a feature for the qem-dashboard

Yes, i'm trying to understand what the users actually need here, and this is starting to look like two separate features (for followup tickets). For the openQA side we don't really want to introduce a "squad" concept, which is very SUSE org specific, just to be able to search for unreviewed jobs. This could theoretically already be done with the existing data, as you said, with the TODO option and some regex matching of job group names. That's what i'll prototype for this timeboxed ticket.

The second feature is potentially for the dashboard, since it is about displaying the status of active incidents. The users would also like to have an overview of which squads are currently blocking which incidents. Squads can once again be identified from job group names here, but we do not yet store enough openQA job metadata to know which jobs have been reviewed. The qem-bot retrieves openQA job comments and parses them to get that information, but does not forward it to the dashboard yet. Theoretically that would be all we need to make per squad blocked incident pages for the dashboard.

The squad/job group name mapping data from existing tooling:

var squadNames = []string{"SAP/HA", "Kernel", "QAC", "Yast", "Security", "Core"}
var squads = map[string][]string{
    "SAP/HA":   {"SAP", " HA"},
    "Kernel":   {"Kernel", " HPC"},
    "QAC":      {"Public", "Containers", "JeOS", "Micro", "Wicked", "SLEM"},
    "Yast":     {"YaST"},
    "Security": {"Security"},
    "Core":     {"Core", "TERADATA", "SLE 15", "SLE 12"},
}
Actions #14

Updated by mgrifalconi 8 months ago

we do not yet store enough openQA job metadata to know which jobs have been reviewed.

Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.

As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:

  • fixing the test issue and getting the job green
  • softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression
  • asking to reject the RR

So I would not want to hide "reviewed" failures to squads, since they bother us (and maintenance coordination) as much as a "unreviewed" failure

Actions #15

Updated by kraih 8 months ago

mgrifalconi wrote in #note-14:

Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.

As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:

  • fixing the test issue and getting the job green
  • softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression
  • asking to reject the RR

That is good to know. Then theoretically we should already have enough data and it's just a matter of deciding how to present it on the dashboard with squad specific pages. Good start would probably be to introduce the "squad" concept on the "Blocked" page and to allow filtering by squad. And perhaps to expose squad data in the dashboard API so the mapping data doesn't have to maintained in multiple places.

Actions #16

Updated by szarate 8 months ago

kraih wrote in #note-15:

mgrifalconi wrote in #note-14:

Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.

As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:

  • fixing the test issue and getting the job green
  • softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression
  • asking to reject the RR

That is good to know. Then theoretically we should already have enough data and it's just a matter of deciding how to present it on the dashboard with squad specific pages. Good start would probably be to introduce the "squad" concept on the "Blocked" page and to allow filtering by squad. And perhaps to expose squad data in the dashboard API so the mapping data doesn't have to maintained in multiple places.

I think that the Squad concept can be mapped to review groups, teams, review teams or user groups... after all, squads is more of an arbitrary concept that comes from Agile at Scale than for how systems are built.

Here I mean that job groups could be assigned to a group, and a user could belong to a group, however bringing the stuff into the dashboard itself has its own advantages... like making #127208 obsolete :)

Actions #17

Updated by kraih 8 months ago

Here I mean that job groups could be assigned to a group, and a user could belong to a group, however bringing the stuff into the dashboard itself has its own advantages... like making #127208 obsolete :)

That works too, and would be fairly trivial to implement if we reuse id.opensuse.org for identity management again. Query parameters for existing filters would probably still be a good idea to have though.

Actions #18

Updated by kraih 8 months ago

  • Status changed from Workable to In Progress

Making a proof of concept for filtering by job group names.

Actions #19

Updated by openqa_review 8 months ago

  • Due date set to 2023-09-08

Setting due date based on mean cycle time of SUSE QE Tools

Actions #20

Updated by kraih 8 months ago

  • Status changed from In Progress to Feedback

Draft PR with possible spike solution: https://github.com/os-autoinst/openQA/pull/5291

Actions #21

Updated by kraih 8 months ago

  • Copied to action #134933: Filter openQA todo-jobs on /tests belonging to one "review squad" size:M added
Actions #22

Updated by kraih 8 months ago

  • Status changed from Feedback to Resolved
Actions #23

Updated by kraih 8 months ago

Followup ticket: #134933

Actions #24

Updated by kraih 8 months ago

  • Status changed from Resolved to In Progress

Back into progress since the PR was for /tests/overview while the feature was supposed to be limited to /tests.

Actions #25

Updated by kraih 8 months ago

  • Status changed from In Progress to Resolved

The followup ticket will be about both endpoints.

Actions #26

Updated by okurz 3 months ago

  • Related to action #153646: [qe-core] Rename incident job groups owned by qe-core added
Actions #27

Updated by okurz 3 months ago

  • Due date deleted (2023-09-08)
Actions

Also available in: Atom PDF