action #119746
closed[spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S
Description
Motivation¶
Also see #118639. The idea is to be able to find openQA jobs that need review as they block approval of maintenance updates. To find jobs needing review we can already use the "todo" checkbox on /tests but we also need to filter out jobs in "development" or outside any job group as well as match on either the job group name, e.g. match "Core Maintenance" or match on not "Kernel Maintenance".
Acceptance criteria¶
- AC1: Proof of concept spike solution exists showing how one can filter "by review squad" over the UI
Suggestions¶
- Proof of concept spike solution to filter jobs on /tests based on job group and job name. To be most flexible regex should be supported including also negative matches to exclude job names or job groups
- As desired ask usual reviewers for how they would look for "their job"
Updated by okurz almost 2 years ago
- Related to action #117655: Provide API to get job results for a particular incident, similar to what dashboard/qem-bot does size:M added
Updated by okurz almost 2 years ago
- Status changed from New to Blocked
- Assignee set to okurz
#117655 first
Updated by okurz almost 2 years ago
- Status changed from Blocked to New
- Assignee deleted (
okurz) - % Done changed from 30 to 0
Updated by mgrifalconi almost 2 years ago
Hello, we were thinking on something similar but on the dashboard.qam.suse.de since it feels the point of integration between openQA and SMELT. But any place is good, important is to fullfill the goal :)
As a reviewer (for my squad or openQA review task) I would like to see:
- all failures related to a squad
- ordered by how many release requests they are blocking (to help prioritize)
- for how many runs (or days) they have been failing
From that view, we can also start collecting long term metrics, maybe with a CI in gitlab and feed into Grafana
Ideas of metrics:
- failures/day
- days before a failure is fixed
- number of release requests blocked
- morning vs evening failures (how much manual work was done to improve situation)
What do you think? How can we contribute?
Updated by szarate almost 2 years ago
- Parent task changed from #114929 to #118639
Changing the parent task to reflect the reality better
Updated by okurz over 1 year ago
- Target version changed from future to Ready
I discussed this with szarate, PO of QE-Core, today and I feel it's the right time that we try to look into this.
Updated by okurz over 1 year ago
- Subject changed from [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" to [spike][timeboxed:20h] Filter openQA todo-jobs on /tests belonging to one "review squad" size:S
- Description updated (diff)
- Status changed from New to Workable
Updated by kraih about 1 year ago
- Assignee set to kraih
Having a hard time deciding if this should be an openQA or dashboard feature, so some more clarification is needed. How exactly can we connect a review squad to an openQA job in the data? We have information like job group names and incidents in the metadata, but how do we match that to specific squads? Is that information present in the data at all yet?
Updated by szarate about 1 year ago
Adding Michael as a watcher,
Right now, we don't have a mapping per see; there's an attempt done by Michael that tries to tackle the issue, however I think nothing keeps us from using the metadata project to map jobgroups to teams (which could backfire) or reuse the qam_jobgroups either by recompiling once the file changes or reading the file directly from the repository (we'd have to add a new property to the yaml), so no jobgroup will be left without an owner.
Updated by okurz about 1 year ago
kraih wrote in #note-10:
Having a hard time deciding if this should be an openQA or dashboard feature, so some more clarification is needed.
This is a clear openQA feature as it is described. But this is also why it's a timeboxed task. For example if we end up with a good understanding why it's not possible or why it does not make sense to have something like this in openQA then we can can still consider a feature for the qem-dashboard
How exactly can we connect a review squad to an openQA job in the data?
In a first approach just what the first ticket suggestion says: "filter jobs on /tests based on job group and job name", with a regex
We have information like job group names and incidents in the metadata, but how do we match that to specific squads? Is that information present in the data at all yet?
Don't consider the SUSE QAM metadata for this. That would be something different.
What szarate said could be something like "filter by regex match on job group description". Then anybody could put any custom field there and match on it
Updated by kraih about 1 year ago
okurz wrote in #note-12:
This is a clear openQA feature as it is described. But this is also why it's a timeboxed task. For example if we end up with a good understanding why it's not possible or why it does not make sense to have something like this in openQA then we can can still consider a feature for the qem-dashboard
Yes, i'm trying to understand what the users actually need here, and this is starting to look like two separate features (for followup tickets). For the openQA side we don't really want to introduce a "squad" concept, which is very SUSE org specific, just to be able to search for unreviewed jobs. This could theoretically already be done with the existing data, as you said, with the TODO option and some regex matching of job group names. That's what i'll prototype for this timeboxed ticket.
The second feature is potentially for the dashboard, since it is about displaying the status of active incidents. The users would also like to have an overview of which squads are currently blocking which incidents. Squads can once again be identified from job group names here, but we do not yet store enough openQA job metadata to know which jobs have been reviewed. The qem-bot retrieves openQA job comments and parses them to get that information, but does not forward it to the dashboard yet. Theoretically that would be all we need to make per squad blocked incident pages for the dashboard.
The squad/job group name mapping data from existing tooling:
var squadNames = []string{"SAP/HA", "Kernel", "QAC", "Yast", "Security", "Core"}
var squads = map[string][]string{
"SAP/HA": {"SAP", " HA"},
"Kernel": {"Kernel", " HPC"},
"QAC": {"Public", "Containers", "JeOS", "Micro", "Wicked", "SLEM"},
"Yast": {"YaST"},
"Security": {"Security"},
"Core": {"Core", "TERADATA", "SLE 15", "SLE 12"},
}
Updated by mgrifalconi about 1 year ago
we do not yet store enough openQA job metadata to know which jobs have been reviewed.
Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.
As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:
- fixing the test issue and getting the job green
- softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression
- asking to reject the RR
So I would not want to hide "reviewed" failures to squads, since they bother us (and maintenance coordination) as much as a "unreviewed" failure
Updated by kraih about 1 year ago
mgrifalconi wrote in #note-14:
Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.
As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:
- fixing the test issue and getting the job green
- softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression
- asking to reject the RR
That is good to know. Then theoretically we should already have enough data and it's just a matter of deciding how to present it on the dashboard with squad specific pages. Good start would probably be to introduce the "squad" concept on the "Blocked" page and to allow filtering by squad. And perhaps to expose squad data in the dashboard API so the mapping data doesn't have to maintained in multiple places.
Updated by szarate about 1 year ago
kraih wrote in #note-15:
mgrifalconi wrote in #note-14:
Please mind that we (openQA reviewers) do not care if there is a comment, a poo# or a bsc# attached to a test failure. That is not good enough and still needs a high priority action to handle the failure.
As long as there is a failure the linked RR will be blocked, so we ask squads to take a decision between:
- fixing the test issue and getting the job green
- softfailing the test while investigating/fixing and knowing that the test is not broken due to a regression
- asking to reject the RR
That is good to know. Then theoretically we should already have enough data and it's just a matter of deciding how to present it on the dashboard with squad specific pages. Good start would probably be to introduce the "squad" concept on the "Blocked" page and to allow filtering by squad. And perhaps to expose squad data in the dashboard API so the mapping data doesn't have to maintained in multiple places.
I think that the Squad concept can be mapped to review groups, teams, review teams or user groups... after all, squads is more of an arbitrary concept that comes from Agile at Scale than for how systems are built.
Here I mean that job groups could be assigned to a group, and a user could belong to a group, however bringing the stuff into the dashboard itself has its own advantages... like making #127208 obsolete :)
Updated by kraih about 1 year ago
Here I mean that job groups could be assigned to a group, and a user could belong to a group, however bringing the stuff into the dashboard itself has its own advantages... like making #127208 obsolete :)
That works too, and would be fairly trivial to implement if we reuse id.opensuse.org for identity management again. Query parameters for existing filters would probably still be a good idea to have though.
Updated by kraih about 1 year ago
- Status changed from Workable to In Progress
Making a proof of concept for filtering by job group names.
Updated by openqa_review about 1 year ago
- Due date set to 2023-09-08
Setting due date based on mean cycle time of SUSE QE Tools
Updated by kraih about 1 year ago
- Status changed from In Progress to Feedback
Draft PR with possible spike solution: https://github.com/os-autoinst/openQA/pull/5291
Updated by kraih about 1 year ago
- Copied to action #134933: Filter openQA todo-jobs on /tests belonging to one "review squad" size:M added
Updated by kraih about 1 year ago
- Status changed from Resolved to In Progress
Back into progress since the PR was for /tests/overview
while the feature was supposed to be limited to /tests
.
Updated by kraih about 1 year ago
- Status changed from In Progress to Resolved
The followup ticket will be about both endpoints.
Updated by okurz 10 months ago
- Related to action #153646: [qe-core] Rename incident job groups owned by qe-core added