action #55730

openQA Tests - action #15132: [saga][epic] Better structure of test plans in main.pm

action #44360: [epic] Parameterize test suites within job groups

[epic] Move parameters from test suites into job groups

Added by coolo 8 months ago. Updated 6 days ago.

Status:BlockedStart date:06/09/2019
Priority:NormalDue date:27/11/2020
Assignee:okurz% Done:

86%

Category:Organisational
Target version:Current Sprint
Difficulty:medium
Duration: 321

Description

We want to move forward with reducing the number of test suites. For that we should analzye and find the job group with the biggest impact on that.

  • analyze the test suites on osd to see which are used in very few job groups
  • check which of them overlap with a lot of settings and seperate settings from parameters
  • suggest shared test suites with common settings to job group owners/maintainers (aka convert to YAML)
  • remove test suites no longer used

After that step I expect usability issues (we already talked about missing a unique name for such job groups) that need to be identified and solved.
And then we'd restart the issue with the next job group.

jobtemplates-o3.tgz (11.7 KB) tinita, 05/02/2020 03:48 pm


Subtasks

action #56540: convert staging job groups to YAMLResolvedcdywan

action #57845: Switch more job groups to YAML job templatesResolvedokurz

action #58652: Write a training file about how to use YAML in job groupResolvedXiaojing_liu

action #60329: Use more parameterized job templates for test suites only...Resolvedtinita

action #60782: descriptions for parameterized job templates independant ...Resolvedtinita

openQA Tests - action #64967: job templates are duplicated as job template in job group...Newtinita


Related issues

Related to openQA Tests - action #43499: [sle][migration] test suites should not have an architect... Workable 07/11/2018
Related to openQA Project - action #47987: Identify unused media, testsuites, machines, etc. New 16/02/2019
Blocked by openQA Project - action #62738: Allow testsuite: null in Jobtemplate YAML Resolved 28/01/2020

History

#1 Updated by okurz 8 months ago

  • Parent task set to #44360

#2 Updated by okurz 8 months ago

what I did for now:

openqa-client --json-output --host https://openqa.suse.de test_suites > test_suites.json
python -c 'import sys, yaml, json; yaml.safe_dump(json.load(sys.stdin), sys.stdout, default_flow_style=False)' < test_suites.json > test_suites.yaml

just to have something nice to read in already what looks more similar to what we want to have in the end :)

Without #55454 I don't see how we can move on though to really move settings into job groups. What I can do however is work to move settings into actual "test code" to not need multiple test suites for the same. I have done that for long and multiple times already with multiple teams. It feels like still the different people revert to just define new test suites in the webui instead of parameterizing in test code. Probably because they are afraid of perl. With the decision to take away a shiny clicky-clicky webui soon that will change as well ;)

#3 Updated by coolo 8 months ago

a) don't make this something else of your own agenda
b) For now just pretend any option for #55454 was implemented and continue on the base

Because identifying the test suites worth it and talking to the groups, identifying best ways to migrate to yaml are still to do. That the final migration is blocked by #55454 needs to be understood by the groups though :)

#4 Updated by okurz 8 months ago

Open points I encountered – which are all not blockers for the current task, just side notes:

  1. in https://openqa.suse.de/admin/job_templates/218 the machine variable 64bit specified in scenarios is redundant because it's default, right? -> yes, redundant. can be removed
  2. should we delete redundant settings when they replicate the default automaticaly utomatically on save?
  3. When we allow to parameterize test suites in job groups isn't it consequential to offer the option to define the test suites implicitly just within the job templates?
  4. I guess openQA from 4 years ago was meant to have test modules which are all rather independant and could even be "dynamically shuffled and loaded". The idea probably was that anyone not needing to write perl code could just click in the webui to define a schedule. Wouldn't we loose this role when ditching the web UI?
  5. People have abused the testsuites to put all kind of test variables. When they do the same in job templates, don't we end up with the same problem? 1 scrollbar can be too long on initial load, see https://w3.nue.suse.com/~okurz/Screenshot_20190821_112122_job_group_templates_scrollbar_too_long.png
  6. the yaml document after pressing save shows old not new document
  7. the defaults section could support more keys, e.g. asmorodskyi saw that he needs "distri: sle" for all products so he tried to define that in defaults as well as settings.
  8. can we allow multiple values for machine to prevent repetition of the scenario? compare

old

with
new

coolo wrote:

a) don't make this something else of your own agenda

how did I do that? In the end, the grandparent ticket #15132 was created by me 2 years ago and we still want to follow that, right?

yes, I can do that

trying to get something more easy to digest, in python (>= 3.5, running ipython3.6):

import sys, yaml, json; y = yaml.safe_dump(open("test_suites.json").read())
y = yaml.load(open('test_suites.yaml').read())
test_suites = [{**{'name': i['name']}, **{'settings': {j['key']: j['value'] for j in i['settings']}}} for i in y['TestSuites']]
open('test_suites_condensed.yaml', 'w').write(yaml.dump(test_suites))
  • Should we have "test suite inclusion" first? shouldn't be so hard to adopt YAML syntax for this, e.g. reference all settings from another test suite with *<name_of_test_suite> like described on https://docs.gitlab.com/ee/ci/yaml/#anchors . This ticket inspired me for https://github.com/os-autoinst/openQA/pull/2279 - but the idea was rejected again in the meantime which I am ok with when the consequence is that we still plan to define test suites in-place eventually.

  • Currently we have

openqa=> select count(id) from job_groups where template is null;
 count 
-------
   186

job groups not yet using YAML template.

  • Converted "Network" job groups with asmorodskyi and he is already going crazy with YAML anchors in https://openqa.suse.de/admin/job_templates/170 and less so in https://openqa.suse.de/admin/job_templates/262
  • Discussed with sebchlad about "mpi": https://openqa.suse.de/admin/test_suites for search term "mpi" currently shows 25 test suites which differ only in the variables HPC, e.g. mpi_slave, and MPI, e.g. mvapich2. So parameterized scenarios based on #55454 could allow to reduce the number of test suites from 25 to 2 (support-server and slave). sebchlad currently does not see a benefit as he has negligible maintenance needed for the current test suites but he is open to have tests migrated. Re-using definitions over job groups would be preferrable in this case unless we want to have inheriting test suites.
  • Slenkins, only defined for SLE15, in https://openqa.suse.de/admin/job_templates/114 , could benefit from parameterized scenarios with currently 112 test suites which mainly (or only) differ in SLENKINS_INSTALL and SLENKINS_NODE. Officially QSF-u is responsible but does not really care about it so we can probably freely experiment there.
  • sles4sap (for SLE15: https://openqa.suse.de/admin/job_templates/146) is a bit more complicated. There are 96 test suites with "sles4sap" in the name. Many are variations of each other and they could benefit from job templates, e.g. "migration_offline+dvd_sles4sap12sp2_ltss" and "migration_offline+dvd_sles4sap12sp3" only differ in HDDVERSION but have 20 other test variables which are common. However there are also many variables which would not be needed in the test suite when specified in test code. Talked with @ldevulder, was interested, did not yet have time to look into job templates, is ok to have the job groups migrated as soon as we have a feature for scenario name templating. https://openqa.suse.de/admin/job_templates/183 and https://openqa.suse.de/admin/job_templates/248 are saved as YAML
  • "Functional" would hardly benefit because test suites are mostly distinct and parameterization happens much more in test code (as it should be)
  • "YaST" would benefit from scenario name templating e.g. for "RAID10_msdos" however QSF-y has a recent test suite explosion due to how QSF-y handles the YAML based test module schedules
  • "Functional: Desktop" has many candidates for scenario name templating, e.g. "[x11,wayland]-desktopapps-[documentation,firefox,gnome,message,other]" but also "desktopapps-remote-desktop-xrdp-[client1,client2,client3]", etc. . Pinged @yfjiang in #testing (RC) to find responsible and trigger the first step
  • "Jeos" https://openqa.suse.de/admin/job_templates/162 would currently not benefit much
  • "public cloud" … I don't know. There are 17 test suites with "publiccloud" in the name, scenario name templating could help
  • "Migration" is an interesting challenge. I linked this ticket to #43499 now which I had already opened some time ago: There are many test suites abused to parameterize per product and architecture. This should have been covered in test code but can now also been done with job group templates. I have commented in #43499 and asked for the team to convert to YAML first and proceed with the refactoring of the test suites. Also they have test suites like "X86_64" and "aarch64" – yes, test suites! – where I am not sure how to help?!? https://openqa.suse.de/admin/job_templates/245 and https://openqa.suse.de/admin/job_templates/246 and https://openqa.suse.de/admin/job_templates/247 show many alterations. As they mention the product version they test in the test suite themselves I consider them the main if not only benefit that can truly benefit from job template settings until we can reuse job templates among different job groups
  • "Virtualization-Acceptance" https://openqa.suse.de/admin/job_templates/163 is using "kvm"/"xen" as parameter as well as a product version, e.g. in "gi-guest_sles11sp4-on-host-developing-kvm"
  • "HA" https://openqa.suse.de/admin/job_templates/157 is using many multi-machine tests so similar to "mpc"/"HPC", parameterizing the nodes
  • "Kernel" https://openqa.suse.de/admin/job_templates/155 is using some "ltp" scenarios parameterizing the ltp-specific selection
  • "File Systems" https://openqa.suse.de/admin/job_templates/240 is using number-parameters to segment the individual xfstests subtests (I think)
  • "Security" https://openqa.suse.de/admin/job_templates/167 is mildly using parameters for fips

  • Counting how often test suites are used:

    • single occurences -> candidates for in-place defined test suites
    • few occurences -> candidates for job templates
    • many occurences -> keep test suites as is
select name,count(group_id) from job_templates, test_suites where test_suites.id = job_templates.test_suite_id and test_suites.t_created <= '2019-08-01' group by name order by count(group_id) desc;

returns e.g.

                                          name                                          | count 
----------------------------------------------------------------------------------------+-------
 gnome                                                                                  |    73
 textmode                                                                               |    64
 btrfs                                                                                  |    52
 ltp_sched                                                                              |    42
…
 om_smt_sles12sp2_pcm_allpatterns_full_update_by_zypper_ppc                             |     1
 offline_sled12sp4_pscc_base_all_full                                                   |     1
 autoupgrade_sles12sp4_media_lp_def_full                                                |     1

If I read this correctly e.g. "gnome" is used often, i.e. multiple times in multiple job groups (makes sense), the migration jobs are only used once and in a single job group.

  • Using
select id,name,t_created,t_updated from test_suites where id not in (select test_suite_id from job_templates) order by name;

we can count test suites which are not referenced in any job templates so no job group. These I consider candidates for deletion. See https://w3.nue.suse.com/~okurz/unused_testsuites.txt for the complete list

next steps:

  • count how often test suites are used: multiple times but only in single job group

#6 Updated by sebchlad 8 months ago

@okurz: what will then happen with the MPIs test suits defined in openQA? Should I wait for you do something one day or anything what I shall do?
Or perhaps you pick another job group to test this?

#7 Updated by okurz 8 months ago

  • Related to action #43499: [sle][migration] test suites should not have an architecture specific suffix added

#8 Updated by okurz 8 months ago

sebchlad wrote:

@okurz: what will then happen with the MPIs test suits defined in openQA?

the test suite definitions are not touched at all by saving a job group in YAML format.

Should I wait for you do something one day or anything what I shall do?

Whenever you plan for yourself within the next time you should save the job group in YAML format. From then on you can define more and more settings in the job group itself and then accordingly simplify, strip down and delete test suites. I can support you in this or also do it for you if you like.

Or perhaps you pick another job group to test this?

don't worry, you are not the (only) guinea pig ;) See #55730#note-4 for my evaluation of many more job groups.

#9 Updated by okurz 8 months ago

  • Status changed from New to Feedback

Please see #55730#note-4 for my current situation of the overall situation. I would appreciate feedback.

#10 Updated by okurz 7 months ago

idea from coolo about the unused test suites: Have a reference to a last used job for these test suites.

AC from coolo: "every test suite not used in the last 200K jobs will be dropped after an announcement including export of its settings"

#12 Updated by okurz 7 months ago

With my analysis in #55730#note-4 I consider QA SLE Migration the main group that can really benefit from job template parameters as they have managed to create test suites that are specific per product version as well as architecture, i.e. test suites that only make sense within a single job group so that any reuse of test suites among multiple job groups would not be necessary. I have provided a comment again in #43499#note-5 and pinged them in Rocket Chat. Let's see if there is response. For all other job groups right now it is a compromise and rather personal choice what people find more efficient: Defining more test suites or duplicating settings in job groups. This would improve as well when we can reuse more settings among multiple job groups.

EDIT: 2019-09-23: Discussed in RC: We are on good track although some teams could be more active to switch but we triggered many and have received helpful feedback what can be improved.

#13 Updated by okurz 6 months ago

  • Related to action #47987: Identify unused media, testsuites, machines, etc. added

#14 Updated by okurz 6 months ago

  • Target version changed from Ready to Current Sprint

waiting for colleagues mainly from "QA SLE Migration" and "QA SLE Virtualization" to be able to follow on with migration, roughly mid of October.

#15 Updated by okurz 5 months ago

  • Due date changed from 22/10/2019 to 05/11/2019

due to changes in a related task

#16 Updated by okurz 5 months ago

  • Due date changed from 05/11/2019 to 19/11/2019

due to changes in a related task

#17 Updated by okurz 5 months ago

  • Due date changed from 19/11/2019 to 17/12/2019

due to changes in a related task

#18 Updated by okurz 5 months ago

  • Due date changed from 17/12/2019 to 19/11/2019

due to changes in a related task

#19 Updated by okurz 5 months ago

After we have now migrated all job templates on osd to YAML format I will go ahead with a cleanup of old, unused test suites. All test suites which are currently not referenced in any job template are backed up in
https://w3.nue.suse.com/~okurz/openqa_osd_testsuite_backup_poo55730/unused_testsuites_2019-11-19.txt
To be even more conservative I deleted only test suites which have not been updated since 2019-07-01:

=> delete from test_suites where id not in (select test_suite_id from job_templates) and t_updated <= '2019-07-01';
DELETE 1181

sent an email to openqa@suse.de about the deletion:

Hi,
in our ongoing endeavour to make the test schedules easier to maintain we see multiple tasks. A set of tasks is centered around the test suites where on OSD we had recently around 4k (!) test suites – compared to just 331 on o3. One task we identified was to make test suites on OSD more manageable by simply deleting unused ones assuming they are not needed anymore. To be a bit conservative I deleted now only all test suites on OSD which are not referenced in any job group schedule and
not updated since 2019-07-01 . In total this removed 1181 test suites.
A backup of the complete set of test suites exists as well.

Unless I receive negative reports from any of you I plan to also eventually delete the other, unused test suites. Other plans for improvement: Replace individual test suites by parameterized job groups, especially when used only in single job groups.

See https://progress.opensuse.org/issues/55730 for more details.

Have fun,
Oliver

To list again test suites based on their number of uses we can use:

select test_suites.name,count(group_id) from job_templates, test_suites where test_suites.id = job_templates.test_suite_id and test_suites.t_created <= '2019-08-01' group by test_suites.name order by count(group_id) desc;

the query now has to use test_suites.name rather than just name vs. #55730#note-4

#20 Updated by okurz 5 months ago

  • Subject changed from EPIC: Move parameters from test suites into job groups to [epic] Move parameters from test suites into job groups
  • Status changed from Feedback to New
  • Assignee deleted (okurz)

With this and because we did not (yet) implement #55454#note-4 "Define testsuites in-place with the name" I am not sure how to move forward. We could implement in-place test suites for the case when a test suite is just used once, we could change more test suites to parameterized job templates with the problem of duplication among different productions due to no way to re-use job templates over multiple job groups or we allow to specify the complete schedule information in a single yaml document and/or allow yaml documents to reference each other like with https://docs.gitlab.com/ee/ci/yaml/#include

EDIT: 2019-11-27: Crosschecked my understanding with asmorodskyi as he was eager to try out new job templates features in the recent past. He confirmed my suspicion that parameterized job templates within individual job groups are not looking beneficial for the network tests as there is a good collaboration with QAM which wants to use the same test suites. Defining variants within individual job groups would trigger unhealthy duplication. So same as described in #55730#note-4 already.
Also discussed with mkittler in RC which brought me to an idea in https://chat.suse.de/group/openqa-dev?msg=grS6pfdr3vqGqnnsA : "What would happen if we just define a global "null" test suite that has nothing included and we extend this one?" . This should in practice allow us to define job templates in place.

#21 Updated by okurz 4 months ago

  • Status changed from New to Feedback
  • Assignee set to okurz

brought me to an idea, continuing in #60329

#22 Updated by okurz 4 months ago

  • Status changed from Feedback to Blocked

blocked by #59097 and #60329

#23 Updated by okurz 2 months ago

okurz wrote:

Also discussed with mkittler in RC which brought me to an idea in https://chat.suse.de/group/openqa-dev?msg=grS6pfdr3vqGqnnsA : "What would happen if we just define a global "null" test suite that has nothing included and we extend this one?" . This should in practice allow us to define job templates in place.

Brought up the topic again in the QA tools team weekly meeting. We agreed on defining test suites in-place is a good idea which conflicts with coolo's opinion in #55454#note-7 "We for sure don't want to generate test suites from here. Only name [scenarios]". At the time we agreed on option 2+ which we implemented. In #60329#note-1 I suggested and implemented a convention-based approach to explicitly inherit from a test suite "empty", which has no settings, to effectively define a test suite in-place. The approach was accepted within the a team as the way to go for now. With this I suggest to re-think if we want to implement as described in #55454#note-4 :

  • Option 2: "Define testsuites in-place implicitly when no test suite matches the name" -> general agreement is that this is too dangerous. Any future test suite that accidentally matches a job template name should not automatically leak its setting into the matching job templates.
  • Option 2b: "Explicit in-place definition with a special value". (+1 by coolo) Further sub-specifications:
    • Option 2b actual: "… with special value 'testsuite: none'"
    • Option 2c: "… with special value 'testsuite: null'" (+1 by tinita, +1 by okurz)
    • Option 2d: "… with special value 'testsuite: empty'"
    • Option 2e: "… with special value 'testsuite: none/null/empty'"

I will ask to discuss that in RC and record results here.

EDIT: opinions recorded, current majority is on 2e

EDIT: 2020-01-24: I updated my vote to go for 2c instead

#24 Updated by coolo 2 months ago

"We for sure don't want to generate test suites from here." means we don't want TestSuite database elements from within job groups yaml. I still don't see any value - because you would have to 2 places to maintain it afterwards.

#25 Updated by okurz 2 months ago

Yes, I agree.

#26 Updated by tinita 2 months ago

I can't go back to the chat because RocketChat won't let me, it's too old.

I would like to know more about the reason for the votes.

Allowing null and none compared to only allowing null (or only none) IMHO is more work and can lead to more confusion.

It would mean that the Schema needs to add null as a possible value, but in the perl code then we have to additionally check:
* Is is undef?
* Is it a string none?

When people are giving examples to other people how to use no testsuite, they have two possibilities, so some would use null and some would use none.
People would start wondering if there is a difference, or if one of the examples is maybe wrong.

#27 Updated by okurz 2 months ago

Agreed. I shift my vote to 2c then :)

#28 Updated by cdywan 2 months ago

Unanimously decided on 2c in the weekly

#29 Updated by tinita 2 months ago

  • Blocked by action #62738: Allow testsuite: null in Jobtemplate YAML added

#30 Updated by tinita about 1 month ago

To get all testsuites used once:

SELECT ts.id, ts.name, count(*) AS c, STRING_AGG(jt.group_id::character varying, ' ') AS ids
FROM job_templates jt INNER JOIN test_suites ts ON jt.test_suite_id=ts.id
GROUP BY ts.id, ts.name
HAVING COUNT(*) = 1
ORDER BY ts.name;

#31 Updated by tinita about 1 month ago

I updated all jobtemplates on o3.
I attached jobtemplates-o3.tgz which contains all old and new templates.

for i in old/*.yaml; do
colordiff -u50  "$i" "${i/old/new}" | less -R
done

Also available in: Atom PDF