Project

General

Profile

action #92921

QA - coordination #99303: [saga][epic] Future improvements for SUSE Maintenance QA workflows with fully automated testing, approval and release

openQA Project - coordination #99306: [epic] Future improvements: Make reviewing openQA results per squad easier

[tools][spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below size:M

Added by okurz about 1 year ago. Updated 7 months ago.

Status:
In Progress
Priority:
Normal
Assignee:
Category:
Enhancement to existing tests
Target version:
Start date:
2021-05-21
Due date:
% Done:

0%

Estimated time:
Difficulty:

Description

Motivation

the teams of the former SUSE department QA SLE have set up individual job groups within product codestreams on openQA corresponding to their team's work scope. For Maintenance so far there are parent job groups like "Maintenance: Single Incidents" and "Maintenance: Test Repo" with subgroups like "Maintenance: SLE 15 SP3 Incidents" so not split by team scopes. To cover the test scope of Maintenance tests review would be easier if we could apply the same structuring of job groups for Maintenance tests as we already have for product validation tests.

Acceptance criteria

  • AC1: Feasibility of changing the job group structure of maintenance tests with expected consequences is documented

Suggestions

  • Ask around, e.g. on openqa@suse.de, maintenance@suse.de, https://chat.suse.de/channel/testing within the teams what problems could be expected when applying that change on openqa.suse.de
  • On a clean openQA instance import an OSD database dump, create a job group structure for maintenance tests similar to the "SLE15" job group and update all "maintenance" tests to show up in the new structure, e.g. with SQL updates
  • Discuss or investigate the impact of the changed structure on the behaviour of gitlab.suse.de/maintenance/smelt and https://gitlab.suse.de/qa-maintenance/openQABot/
  • Optional: Implement the actual change on OSD in case we found it to be feasible :) In that case maybe start with just part of the structure for a first step?
  • Provide the result on a staging instance

Further details

If we manage to apply the same structures for Maintenance tests as for product validation we can provide test overview links specific for each team.

menu.png (33.1 KB) menu.png mgrifalconi, 2021-11-24 15:18
TODO-view.png (125 KB) TODO-view.png mgrifalconi, 2021-11-24 15:29
12225
12231

Related issues

Related to QA - coordination #95857: [epic] QAM incident tests: Fix "next & previous", latest results and label carry-overWorkable2021-07-22

Copied to openQA Project - action #109650: [tools][spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below size:MNew2021-05-21

History

#1 Updated by okurz about 1 year ago

  • Subject changed from can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below to Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below
  • Description updated (diff)
  • Status changed from New to Workable
  • Priority changed from Normal to High

#2 Updated by okurz about 1 year ago

  • Parent task changed from #91646 to #91914

#3 Updated by mkittler about 1 year ago

I would suggest to create a completely new parent job group and sub job groups so we know what structure is wanted. Then testers can change the job scheduling to use these news groups from now on.

Of course we can also move jobs from their old groups to the new groups using SQL queries. This shouldn't be hard to do if one knows the criteria for mapping the jobs to their new groups. We could of course do this on a staging instance first to avoid mistakes in production and to give the users a preview.


I guess it would already help if test writers would propose a job group structure like this with the relevant jobs which would go into the particular groups:

  • Maintenance v2
    • Team 1: jobs matching settings FOO=bar, …
    • Team 2: jobs matching settings FOO=baz, …

#4 Updated by cdywan about 1 year ago

okurz wrote:

  • On a clean openQA instance import an OSD database dump, create a job group structure for maintenance tests similar to the "SLE15" job group and update all "maintenance" tests to show up in the new structure, e.g. with SQL updates

I would suggest to use an instance that's exposed via the VPN and can be checked out and discussed with others, if that's not already implied (to my mind clean usually suggests a test instance on an individual developer machine).

#5 Updated by mkittler about 1 year ago

I'd use one of the staging instances for that (as they shouldn't be publicly reachable).

#6 Updated by okurz about 1 year ago

mkittler wrote:

[…] I guess it would already help if test writers would propose a job group structure like this with the relevant jobs which would go into the particular groups:

This is why I suggested "create a job group structure for maintenance tests similar to the "SLE15" job group and update all "maintenance" tests to show up in the new structure, e.g. with SQL updates". But it can be helpful to grab any "test writer" and work with them to try out where the test(s) that they are interested in should reside

#7 Updated by okurz about 1 year ago

  • Subject changed from Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below to [spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below

#8 Updated by osukup about 1 year ago

vpelcak any idea how target look of groups?

#9 Updated by okurz about 1 year ago

  • Priority changed from High to Low

As confirmed by vpelcak in DM where he addressed me in https://chat.suse.de/direct/2FFxjQXvCCPbj5kbFYvRdSgu8mNq2StdnK?msg=qKsBM6NKNqghTkuEF we can focus on "openqa-review" related tasks with the expectation that this can act as the "better overview over their failing testcases" so we reduce priority here

#10 Updated by okurz 12 months ago

  • Status changed from Workable to New

moving all tickets without size confirmation by the team back to "New". The team should move the tickets back after estimating and agreeing on a consistent size

#11 Updated by okurz 12 months ago

  • Project changed from openQA Project to openQA Tests
  • Subject changed from [spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below to [tools][spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below
  • Category changed from Feature requests to Enhancement to existing tests

#12 Updated by okurz 12 months ago

  • Parent task changed from #91914 to #91467

#13 Updated by okurz 12 months ago

  • Priority changed from Low to Normal

After recent openqa-review tasks progress we can try again here with a bit higher prio as next step.

#14 Updated by okurz 12 months ago

I brought up the topic within the weekly SUSE QE sync meeting 2021-07-21 and no reasons could be found why we have individual job groups per service pack. Some job groups like https://openqa.suse.de/parent_group_overview/23#grouped_by_build seem complicated with a job group per service pack. This means that for every service pack a new job group with new job templates and mostly copy-pasted schedule is created. Maybe at that time (years ago) the display of results was not meeting expectations but since we have test overview queries like https://openqa.suse.de/tests/overview?result=failed&arch=&flavor=&machine=&test=&modules=&groupid=366&groupid=308&groupid=232&groupid=165&groupid=280&groupid=218&groupid=108&groupid=54# that properly resolve the versions (also since years) that limitation would not be true anymore. I will ask more people.

EDIT: sent email to openqa@suse.de and qa-team@suse.de

#15 Updated by MDoucha 12 months ago

okurz wrote:

I brought up the topic within the weekly SUSE QE sync meeting 2021-07-21 and no reasons could be found why we have individual job groups per service pack.

The reason is simple: investigation of failures. You can't filter by service pack in the "Next & previous results" tab. In some groups, there will be 100+ unrelated incidents between the current version of the package which you're investigating and the last released version that you want to compare the results to. If the per-servicepack job groups get merged into one, looking up previous job results will become nearly impossible in some groups.

#16 Updated by jkohoutek 12 months ago

It's because MUs are usually not targeted to all versions of the product, but to the specific ones only and this varies from only one of them to all.

#17 Updated by okurz 12 months ago

MDoucha wrote:

okurz wrote:

I brought up the topic within the weekly SUSE QE sync meeting 2021-07-21 and no reasons could be found why we have individual job groups per service pack.

The reason is simple: investigation of failures. You can't filter by service pack in the "Next & previous results" tab. In some groups, there will be 100+ unrelated incidents between the current version of the package which you're investigating and the last released version that you want to compare the results to. If the per-servicepack job groups get merged into one, looking up previous job results will become nearly impossible in some groups.

Jobs in "next & previous" are resolved to be within the same scenario, i.e. same product, version, flavor, arch, testsuite, machine. AFAIK the job group has no impact and also jobs with differing versions do not show up as the same scenario. See in http://lord.arch/tests/2794#next_previous that there is only a single job. http://lord.arch/tests/2794 and http://lord.arch/tests/2795 only differ in version in build. Maybe you mean something different?

jkohoutek wrote:

It's because MUs are usually not targeted to all versions of the product, but to the specific ones only and this varies from only one of them to all.

Yes, that is clear. So we obviously need separate tests for each version of a product. But that should not mean that we need a separate job group per version of the product in openQA tests, right?

#18 Updated by okurz 12 months ago

osukup wrote:

vpelcak any idea how target look of groups?

Similar as for SLE 15 which looks like this:

  • SLE 15 (parent group)
    • Functional -> QE-Core team
    • YaST -> YaST team
    • Migration: Regression -> Migration team
    • HA -> SHAP(?) team

where each job group shows results from e.g. SP2-Build187.1 as well as SP3-Build11.1

Instead of

  • Maintenance: Single Incidents
    • Maintenance: SLE 15 SP1 Incidents
    • Maintenance: SLE 15 SP2 Incidents
    • Maintenance: SLE 15 SP2 HA Incidents
    • Maintenance: SLE 15 SP3 Incidents
  • Maintenance: Test Repo
    • Maintenance: SLE 15 SP1 Updates
    • Maintenance: SLE 15 SP2 Updates
    • Maintenance: SLE 15 SP3 Updates

have something like

  • Maintenance: Single Incidents
    • Functional
    • YaST
    • HA
  • Maintenance: Test Repo
    • Functional
    • YaST
    • HA

where each job group can again show results for multiple versions with their builds in parallel, e.g. 15-SP1-Build20210720-2 as well as 15-SP2-Build20210720-2

#19 Updated by asmorodskyi 12 months ago

+1 to this , I think it would be much better .
I wonder if in same iteration we should also drop "Maintenance" and "Updates" from group names ? or you will treat this as separate topic ?

#20 Updated by jkohoutek 12 months ago

asmorodskyi wrote:

+1 to this , I think it would be much better .
I wonder if in same iteration we should also drop "Maintenance" and "Updates" from group names ? or you will treat this as separate topic ?

Why would we want this?

#21 Updated by asmorodskyi 12 months ago

jkohoutek wrote:

asmorodskyi wrote:

+1 to this , I think it would be much better .
I wonder if in same iteration we should also drop "Maintenance" and "Updates" from group names ? or you will treat this as separate topic ?

Why would we want this?

I don't see why we need it actually ? Isn't it obvious that SLE 12 SP4 is in maintenance ? this very long job group names spam a lot of places ( test reports , openQA UI , YAML config files without giving anything back ) . Yes I know that currently there is some logic in openQABot which rely on this but it can be easily done in a way to avoid it.

#22 Updated by okurz 12 months ago

  • Related to coordination #95857: [epic] QAM incident tests: Fix "next & previous", latest results and label carry-over added

#23 Updated by cdywan 10 months ago

  • Subject changed from [tools][spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below to [tools][spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below size:M
  • Status changed from New to Workable

#24 Updated by cdywan 10 months ago

  • Description updated (diff)

#25 Updated by okurz 9 months ago

  • Target version changed from Ready to future
  • Parent task changed from #91467 to #99306

In the meantime we have completed other features which I see as good alternative benefits for now, e.g. better test overview filtering and such. With this, seeing the other outstanding issues in our backlog I am moving this ticket out of the current backlog

#27 Updated by mgrifalconi 9 months ago

  • Assignee set to mgrifalconi

#28 Updated by okurz 8 months ago

Discussed during the weekly QE sync 2021-11-10. We clarified some points, e.g. that the intention is not to multiply the number of job groups by squad times product times version but of course have multiple products and versions within the same job groups but use the job groups to separate by squad.

#29 Updated by okurz 7 months ago

current state was presented during weekly QE sync 2021-11-24 and mgrifalconi received a lot of positive feedback from group representatives. http://d432.qam.suse.de/ shows the current state with test results moved into a different job group structure. https://gitlab.suse.de/-/snippets/1606 has some notes. With this pages like http://10.161.229.176/tests/overview?groupid=404 can be used to show a job group and hence squad specific view of all related test results regardless of build product version and/or build id.

mgrifalconi as discussed, please provide screenshot, current state, plans for the next steps, etc. Thanks

#30 Updated by mgrifalconi 7 months ago

12225
12231

Thanks everyone for the nice feedback.

Let's start with the goals/requirements I can think of, from different points of view:

  • As a "squad member", I would like to clearly see in one place, all tests that my squad maintain. This allows me to monitor them and react in case of failure, without waiting for someone to contact my squad about that.
  • As the "openQA reviewer" I would like to unequivocally identify which squad is responsible for a certain failure, to avoid the risk of some ping-pong between different people arguing about what is in their scope.
  • As a "openQA review stakeholder (i.e. Maint.Coordinator)" I would like to have all the time a clear picture of the current situation on openQA and see what is blocking today's updates and (in case of urgency) directly get in touch with the responsible of that test for clarifications.

More general cleanup we address:

  • There is no need for a job group for each SLE version/flavor. It will possible to query for them in case of that particular need.

Here is a screenshot of the result, a (maintenance update) job group for each QE Squad.
At the time of writing, only "Kernel", "Container", "SAP/HA" and "Core" were defined but we can work on multiple iterations.

Right now the POC was about moving existing jobs on a database dump of osd.
To move future jobs, we will need to shuffle around job group definitions.
These things are completely independent as well.

Possible plan of action:

  • Create a new job group that include all Maint.Updates SLE versions and call it QE-Core
  • Create a new job group with a squad name and move out from QE-Core relevent stuff
  • Repeat step 2 until done

OR

  • Create a new job group with a squad name (not QE-Core) and move out relevant stuff from the existing multiple groups.
  • Repeat step 1 until QE-Core stuff remains and create its group with that

At the same time we might want to move old test results using some SQL but it is unrelated.

Decisions to take before starting

  • Job Group on root dir or inside its own directory? (Dir give the advantage of having the overview button in the homepage)
  • Job Group per component or per squad (SAP/HA <<>> SAP and HA) (Container and JeOS and Public Cloud)

One more screenshot:

#31 Updated by okurz 7 months ago

Possible plan of action:

  • Create a new job group that include all Maint.Updates SLE versions and call it QE-Core
  • Create a new job group with a squad name and move out from QE-Core relevent stuff
  • Repeat step 2 until done

OR

  • Create a new job group with a squad name (not QE-Core) and move out relevant stuff from the existing multiple groups.
  • Repeat step 1 until QE-Core stuff remains and create its group with that

I would say that latter approach has already been done with the "Containers" parent group so you could just continue there :) Also, there are already multiple job groups like "Maintenance: SLE 15 SP3 HA Incidents" and "Maintenance: SLE 15 SP3 HA Incidents" so one could go ahead with combining all these version specific job groups into one but keep the current component/squad structuring in a first step. So e.g. combine all "Maintenance: SLE $version Kernel Incidents" into one new "Maintenance: SLE Kernel Incidents", same for HA, SAP, etc.

At the same time we might want to move old test results using some SQL but it is unrelated.

True. Also I would simply not bother and keep the old test results untouched as long as they exist as they are automatically removed eventually anyway.

Decisions to take before starting

  • Job Group on root dir or inside its own directory? (Dir give the advantage of having the overview button in the homepage)

I guess with "own directory" you mean "parent group"? Then the answer is parent group because as you stated you still want to see the grouping of all "maintenance" tests and such. The important part is that with squad-specific child job groups test overview links can be created that show "all tests that one squad is interested in regardless of version".

  • Job Group per component or per squad (SAP/HA <<>> SAP and HA) (Container and JeOS and Public Cloud)

See my above proposal. But I consider it most important that the structure is essentially the same as for product validation.

#32 Updated by mgrifalconi 7 months ago

  • Status changed from Workable to In Progress

#33 Updated by jctmichel 7 months ago

mgrifalconi wrote:

SAP/HA is done: https://openqa.suse.de/tests/overview?groupid=405

Looks very good, but where exactly is this link accessible from in the openqa menu? I can see a SAP/HA Maintenance Updates job group, but it only displays the results for 15-SP3. The tests that are shown for 15 GA and 15-SP1 have been cancelled. What was the reason for this?

More importantly, it might be necessary to move the mau-sles-sys-param-check@64bit-2gbram test into the SLE Updates job group.

#34 Updated by okurz 7 months ago

jctmichel wrote:

mgrifalconi wrote:

SAP/HA is done: https://openqa.suse.de/tests/overview?groupid=405

Looks very good, but where exactly is this link accessible from in the openqa menu? I can see a SAP/HA Maintenance Updates job group, but it only displays the results for 15-SP3. The tests that are shown for 15 GA and 15-SP1 have been cancelled. What was the reason for this?

From any page on https://openqa.suse.de from the top-down "Job Groups" selections menu one can find the parent group SAP/HA and within there the single job group SAP/HA Maintenance Updates.
This page shows builds for all the versions, it's just not obvious because only the build number is shown. We have an open feature request #53264 to also show the version when job groups have information for multiple versions

More importantly, it might be necessary to move the mau-sles-sys-param-check@64bit-2gbram test into the SLE Updates job group.

The goal is to make it explicit by the job group selections which squads take responsibility for a test scenario regardless for which product/version/flavor a scenario applies to. So if you find an agreement that another QE squad takes over then you or they can move the according job templates which are part of the schedule

#35 Updated by okurz 3 months ago

  • Copied to action #109650: [tools][spike] Can we change or display job group structure for maintenance job groups to have one job group per team like for product validation and maybe specific products and versions below size:M added

Also available in: Atom PDF