action #55730

openQA Tests - action #15132: [epic] Better structure of test plans in main.pm

action #44360: [epic] Parameterize test suites within job groups

[epic] Move parameters from test suites into job groups

Added by coolo 3 months ago. Updated 2 days ago.

Status:NewStart date:06/09/2019
Priority:NormalDue date:19/11/2019
Assignee:-% Done:

100%

Category:Organisational
Target version:Current Sprint
Difficulty:medium
Duration: 53

Description

We want to move forward with reducing the number of test suites. For that we should analzye and find the job group with the biggest impact on that.

  • analyze the test suites on osd to see which are used in very few job groups
  • check which of them overlap with a lot of settings and seperate settings from parameters
  • suggest shared test suites with common settings to job group owners/maintainers (aka convert to YAML)
  • remove test suites no longer used

After that step I expect usability issues (we already talked about missing a unique name for such job groups) that need to be identified and solved.
And then we'd restart the issue with the next job group.


Subtasks

action #56540: convert staging job groups to YAMLResolvedcdywan

action #57845: Switch more job groups to YAML job templatesResolvedokurz

action #58652: Write a training file about how to use YAML in job groupResolvedXiaojing_liu


Related issues

Related to openQA Tests - action #43499: [sle][migration] test suites should not have an architect... New 07/11/2018
Related to openQA Project - action #47987: Identify unused media, testsuites, machines, etc. New 16/02/2019

History

#1 Updated by okurz 3 months ago

  • Parent task set to #44360

#2 Updated by okurz 3 months ago

what I did for now:

openqa-client --json-output --host https://openqa.suse.de test_suites > test_suites.json
python -c 'import sys, yaml, json; yaml.safe_dump(json.load(sys.stdin), sys.stdout, default_flow_style=False)' < test_suites.json > test_suites.yaml

just to have something nice to read in already what looks more similar to what we want to have in the end :)

Without #55454 I don't see how we can move on though to really move settings into job groups. What I can do however is work to move settings into actual "test code" to not need multiple test suites for the same. I have done that for long and multiple times already with multiple teams. It feels like still the different people revert to just define new test suites in the webui instead of parameterizing in test code. Probably because they are afraid of perl. With the decision to take away a shiny clicky-clicky webui soon that will change as well ;)

#3 Updated by coolo 3 months ago

a) don't make this something else of your own agenda
b) For now just pretend any option for #55454 was implemented and continue on the base

Because identifying the test suites worth it and talking to the groups, identifying best ways to migrate to yaml are still to do. That the final migration is blocked by #55454 needs to be understood by the groups though :)

#4 Updated by okurz 3 months ago

Open points I encountered – which are all not blockers for the current task, just side notes:

  1. in https://openqa.suse.de/admin/job_templates/218 the machine variable 64bit specified in scenarios is redundant because it's default, right? -> yes, redundant. can be removed
  2. should we delete redundant settings when they replicate the default automaticaly utomatically on save?
  3. When we allow to parameterize test suites in job groups isn't it consequential to offer the option to define the test suites implicitly just within the job templates?
  4. I guess openQA from 4 years ago was meant to have test modules which are all rather independant and could even be "dynamically shuffled and loaded". The idea probably was that anyone not needing to write perl code could just click in the webui to define a schedule. Wouldn't we loose this role when ditching the web UI?
  5. People have abused the testsuites to put all kind of test variables. When they do the same in job templates, don't we end up with the same problem? 1 scrollbar can be too long on initial load, see https://w3.nue.suse.com/~okurz/Screenshot_20190821_112122_job_group_templates_scrollbar_too_long.png
  6. the yaml document after pressing save shows old not new document
  7. the defaults section could support more keys, e.g. asmorodskyi saw that he needs "distri: sle" for all products so he tried to define that in defaults as well as settings.
  8. can we allow multiple values for machine to prevent repetition of the scenario? compare

old

with
new

coolo wrote:

a) don't make this something else of your own agenda

how did I do that? In the end, the grandparent ticket #15132 was created by me 2 years ago and we still want to follow that, right?

yes, I can do that

trying to get something more easy to digest, in python (>= 3.5, running ipython3.6):

import sys, yaml, json; y = yaml.safe_dump(open("test_suites.json").read())
y = yaml.load(open('test_suites.yaml').read())
test_suites = [{**{'name': i['name']}, **{'settings': {j['key']: j['value'] for j in i['settings']}}} for i in y['TestSuites']]
open('test_suites_condensed.yaml', 'w').write(yaml.dump(test_suites))
  • Should we have "test suite inclusion" first? shouldn't be so hard to adopt YAML syntax for this, e.g. reference all settings from another test suite with *<name_of_test_suite> like described on https://docs.gitlab.com/ee/ci/yaml/#anchors . This ticket inspired me for https://github.com/os-autoinst/openQA/pull/2279 - but the idea was rejected again in the meantime which I am ok with when the consequence is that we still plan to define test suites in-place eventually.

  • Currently we have

openqa=> select count(id) from job_groups where template is null;
 count 
-------
   186

job groups not yet using YAML template.

  • Converted "Network" job groups with asmorodskyi and he is already going crazy with YAML anchors in https://openqa.suse.de/admin/job_templates/170 and less so in https://openqa.suse.de/admin/job_templates/262
  • Discussed with sebchlad about "mpi": https://openqa.suse.de/admin/test_suites for search term "mpi" currently shows 25 test suites which differ only in the variables HPC, e.g. mpi_slave, and MPI, e.g. mvapich2. So parameterized scenarios based on #55454 could allow to reduce the number of test suites from 25 to 2 (support-server and slave). sebchlad currently does not see a benefit as he has negligible maintenance needed for the current test suites but he is open to have tests migrated. Re-using definitions over job groups would be preferrable in this case unless we want to have inheriting test suites.
  • Slenkins, only defined for SLE15, in https://openqa.suse.de/admin/job_templates/114 , could benefit from parameterized scenarios with currently 112 test suites which mainly (or only) differ in SLENKINS_INSTALL and SLENKINS_NODE. Officially QSF-u is responsible but does not really care about it so we can probably freely experiment there.
  • sles4sap (for SLE15: https://openqa.suse.de/admin/job_templates/146) is a bit more complicated. There are 96 test suites with "sles4sap" in the name. Many are variations of each other and they could benefit from job templates, e.g. "migration_offline+dvd_sles4sap12sp2_ltss" and "migration_offline+dvd_sles4sap12sp3" only differ in HDDVERSION but have 20 other test variables which are common. However there are also many variables which would not be needed in the test suite when specified in test code. Talked with @ldevulder, was interested, did not yet have time to look into job templates, is ok to have the job groups migrated as soon as we have a feature for scenario name templating. https://openqa.suse.de/admin/job_templates/183 and https://openqa.suse.de/admin/job_templates/248 are saved as YAML
  • "Functional" would hardly benefit because test suites are mostly distinct and parameterization happens much more in test code (as it should be)
  • "YaST" would benefit from scenario name templating e.g. for "RAID10_msdos" however QSF-y has a recent test suite explosion due to how QSF-y handles the YAML based test module schedules
  • "Functional: Desktop" has many candidates for scenario name templating, e.g. "[x11,wayland]-desktopapps-[documentation,firefox,gnome,message,other]" but also "desktopapps-remote-desktop-xrdp-[client1,client2,client3]", etc. . Pinged @yfjiang in #testing (RC) to find responsible and trigger the first step
  • "Jeos" https://openqa.suse.de/admin/job_templates/162 would currently not benefit much
  • "public cloud" … I don't know. There are 17 test suites with "publiccloud" in the name, scenario name templating could help
  • "Migration" is an interesting challenge. I linked this ticket to #43499 now which I had already opened some time ago: There are many test suites abused to parameterize per product and architecture. This should have been covered in test code but can now also been done with job group templates. I have commented in #43499 and asked for the team to convert to YAML first and proceed with the refactoring of the test suites. Also they have test suites like "X86_64" and "aarch64" – yes, test suites! – where I am not sure how to help?!? https://openqa.suse.de/admin/job_templates/245 and https://openqa.suse.de/admin/job_templates/246 and https://openqa.suse.de/admin/job_templates/247 show many alterations. As they mention the product version they test in the test suite themselves I consider them the main if not only benefit that can truly benefit from job template settings until we can reuse job templates among different job groups
  • "Virtualization-Acceptance" https://openqa.suse.de/admin/job_templates/163 is using "kvm"/"xen" as parameter as well as a product version, e.g. in "gi-guest_sles11sp4-on-host-developing-kvm"
  • "HA" https://openqa.suse.de/admin/job_templates/157 is using many multi-machine tests so similar to "mpc"/"HPC", parameterizing the nodes
  • "Kernel" https://openqa.suse.de/admin/job_templates/155 is using some "ltp" scenarios parameterizing the ltp-specific selection
  • "File Systems" https://openqa.suse.de/admin/job_templates/240 is using number-parameters to segment the individual xfstests subtests (I think)
  • "Security" https://openqa.suse.de/admin/job_templates/167 is mildly using parameters for fips

  • Counting how often test suites are used:

    • single occurences -> candidates for in-place defined test suites
    • few occurences -> candidates for job templates
    • many occurences -> keep test suites as is
select name,count(group_id) from job_templates, test_suites where test_suites.id = job_templates.test_suite_id and test_suites.t_created <= '2019-08-01' group by name order by count(group_id) desc;

returns e.g.

                                          name                                          | count 
----------------------------------------------------------------------------------------+-------
 gnome                                                                                  |    73
 textmode                                                                               |    64
 btrfs                                                                                  |    52
 ltp_sched                                                                              |    42
…
 om_smt_sles12sp2_pcm_allpatterns_full_update_by_zypper_ppc                             |     1
 offline_sled12sp4_pscc_base_all_full                                                   |     1
 autoupgrade_sles12sp4_media_lp_def_full                                                |     1

If I read this correctly e.g. "gnome" is used often, i.e. multiple times in multiple job groups (makes sense), the migration jobs are only used once and in a single job group.

  • Using
select id,name,t_created,t_updated from test_suites where id not in (select test_suite_id from job_templates) order by name;

we can count test suites which are not referenced in any job templates so no job group. These I consider candidates for deletion. See https://w3.nue.suse.com/~okurz/unused_testsuites.txt for the complete list

next steps:

  • count how often test suites are used: multiple times but only in single job group

#6 Updated by sebchlad 3 months ago

@okurz: what will then happen with the MPIs test suits defined in openQA? Should I wait for you do something one day or anything what I shall do?
Or perhaps you pick another job group to test this?

#7 Updated by okurz 3 months ago

  • Related to action #43499: [sle][migration] test suites should not have an architecture specific suffix added

#8 Updated by okurz 3 months ago

sebchlad wrote:

@okurz: what will then happen with the MPIs test suits defined in openQA?

the test suite definitions are not touched at all by saving a job group in YAML format.

Should I wait for you do something one day or anything what I shall do?

Whenever you plan for yourself within the next time you should save the job group in YAML format. From then on you can define more and more settings in the job group itself and then accordingly simplify, strip down and delete test suites. I can support you in this or also do it for you if you like.

Or perhaps you pick another job group to test this?

don't worry, you are not the (only) guinea pig ;) See #55730#note-4 for my evaluation of many more job groups.

#9 Updated by okurz 3 months ago

  • Status changed from New to Feedback

Please see #55730#note-4 for my current situation of the overall situation. I would appreciate feedback.

#10 Updated by okurz 3 months ago

idea from coolo about the unused test suites: Have a reference to a last used job for these test suites.

AC from coolo: "every test suite not used in the last 200K jobs will be dropped after an announcement including export of its settings"

#12 Updated by okurz 3 months ago

With my analysis in #55730#note-4 I consider QA SLE Migration the main group that can really benefit from job template parameters as they have managed to create test suites that are specific per product version as well as architecture, i.e. test suites that only make sense within a single job group so that any reuse of test suites among multiple job groups would not be necessary. I have provided a comment again in #43499#note-5 and pinged them in Rocket Chat. Let's see if there is response. For all other job groups right now it is a compromise and rather personal choice what people find more efficient: Defining more test suites or duplicating settings in job groups. This would improve as well when we can reuse more settings among multiple job groups.

EDIT: 2019-09-23: Discussed in RC: We are on good track although some teams could be more active to switch but we triggered many and have received helpful feedback what can be improved.

#13 Updated by okurz about 1 month ago

  • Related to action #47987: Identify unused media, testsuites, machines, etc. added

#14 Updated by okurz about 1 month ago

  • Target version changed from Ready to Current Sprint

waiting for colleagues mainly from "QA SLE Migration" and "QA SLE Virtualization" to be able to follow on with migration, roughly mid of October.

#15 Updated by okurz 30 days ago

  • Due date changed from 22/10/2019 to 05/11/2019

due to changes in a related task

#16 Updated by okurz 17 days ago

  • Due date changed from 05/11/2019 to 19/11/2019

due to changes in a related task

#17 Updated by okurz 2 days ago

  • Due date changed from 19/11/2019 to 17/12/2019

due to changes in a related task

#18 Updated by okurz 2 days ago

  • Due date changed from 17/12/2019 to 19/11/2019

due to changes in a related task

#19 Updated by okurz 2 days ago

After we have now migrated all job templates on osd to YAML format I will go ahead with a cleanup of old, unused test suites. All test suites which are currently not referenced in any job template are backed up in
https://w3.nue.suse.com/~okurz/openqa_osd_testsuite_backup_poo55730/unused_testsuites_2019-11-19.txt
To be even more conservative I deleted only test suites which have not been updated since 2019-07-01:

=> delete from test_suites where id not in (select test_suite_id from job_templates) and t_updated <= '2019-07-01';
DELETE 1181

sent an email to openqa@suse.de about the deletion:

Hi,
in our ongoing endeavour to make the test schedules easier to maintain we see multiple tasks. A set of tasks is centered around the test suites where on OSD we had recently around 4k (!) test suites – compared to just 331 on o3. One task we identified was to make test suites on OSD more manageable by simply deleting unused ones assuming they are not needed anymore. To be a bit conservative I deleted now only all test suites on OSD which are not referenced in any job group schedule and
not updated since 2019-07-01 . In total this removed 1181 test suites.
A backup of the complete set of test suites exists as well.

Unless I receive negative reports from any of you I plan to also eventually delete the other, unused test suites. Other plans for improvement: Replace individual test suites by parameterized job groups, especially when used only in single job groups.

See https://progress.opensuse.org/issues/55730 for more details.

Have fun,
Oliver

To list again test suites based on their number of uses we can use:

select test_suites.name,count(group_id) from job_templates, test_suites where test_suites.id = job_templates.test_suite_id and test_suites.t_created <= '2019-08-01' group by test_suites.name order by count(group_id) desc;

the query now has to use test_suites.name rather than just name vs. #55730#note-4

#20 Updated by okurz 2 days ago

  • Subject changed from EPIC: Move parameters from test suites into job groups to [epic] Move parameters from test suites into job groups
  • Status changed from Feedback to New
  • Assignee deleted (okurz)

With this and because we did not (yet) implement #55454#note-4 "Define testsuites in-place with the name" I am not sure how to move forward. We could implement in-place test suites for the case when a test suite is just used once, we could change more test suites to parameterized job templates with the problem of duplication among different productions due to no way to re-use job templates over multiple job groups or we allow to specify the complete schedule information in a single yaml document and/or allow yaml documents to reference each other like with https://docs.gitlab.com/ee/ci/yaml/#include

Also available in: Atom PDF